A Neural Architecture for Detecting Identifier Renaming from Diff

Qiqi Gu, Wei Ke

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

In software engineering, code review controls code quality and prevents bugs. Although many commits to a codebase add features, some commits are code refactoring, including renaming of identifiers. Reviewing code refactoring requires a bit of different efforts than that of reviewing functional changes. For instance, renaming an identifier has to make sure that the new name not only is more descriptive and follows the naming convention of the institution, but also does not collide with any other identifiers. We propose in this paper a machine learning model to automatically identify commits consisting of pure identifier renaming, from only the diff files. This technique helps code review enforce naming and coding conventions of the institution, and let quality assurance testers focus more on functional changes. In contrast to the traditional way of detecting such changes by parsing the full source code before and after the commit, which is less efficient and requires rigorous syntactical completeness and correctness, our novel approach based on neural networks is able to read only the diff and gives a confidence value of whether it is a renaming or not. Since there had been no existing labeled dataset on repository commits, we labeled a dataset with more than 1,000 repos from GitHub by Java syntax analysis. Then we trained a neural network to classify these commits as whether they are renaming, obtaining the test accuracy of 85.65% and the false positive rate of 2.03%. The methods in our experiment also have significance for general static analysis with neural network approaches.

Original languageEnglish
Title of host publicationIntelligent Data Engineering and Automated Learning - 22nd International Conference, IDEAL 2021, Proceedings
EditorsDavid Camacho, Peter Tino, Richard Allmendinger, Hujun Yin, Antonio J. Tallón-Ballesteros, Ke Tang, Sung-Bae Cho, Paulo Novais, Susana Nascimento
PublisherSpringer Science and Business Media Deutschland GmbH
Pages33-44
Number of pages12
ISBN (Print)9783030916077
DOIs
Publication statusPublished - 2021
Event22nd International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2021 - Virtual, Online
Duration: 25 Nov 202127 Nov 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13113 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2021
CityVirtual, Online
Period25/11/2127/11/21

Keywords

  • Code review
  • Diff classification
  • Neural networks
  • Program analysis
  • Supervised learning

Fingerprint

Dive into the research topics of 'A Neural Architecture for Detecting Identifier Renaming from Diff'. Together they form a unique fingerprint.

Cite this