TY - GEN
T1 - A Neural Architecture for Detecting Identifier Renaming from Diff
AU - Gu, Qiqi
AU - Ke, Wei
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - In software engineering, code review controls code quality and prevents bugs. Although many commits to a codebase add features, some commits are code refactoring, including renaming of identifiers. Reviewing code refactoring requires a bit of different efforts than that of reviewing functional changes. For instance, renaming an identifier has to make sure that the new name not only is more descriptive and follows the naming convention of the institution, but also does not collide with any other identifiers. We propose in this paper a machine learning model to automatically identify commits consisting of pure identifier renaming, from only the diff files. This technique helps code review enforce naming and coding conventions of the institution, and let quality assurance testers focus more on functional changes. In contrast to the traditional way of detecting such changes by parsing the full source code before and after the commit, which is less efficient and requires rigorous syntactical completeness and correctness, our novel approach based on neural networks is able to read only the diff and gives a confidence value of whether it is a renaming or not. Since there had been no existing labeled dataset on repository commits, we labeled a dataset with more than 1,000 repos from GitHub by Java syntax analysis. Then we trained a neural network to classify these commits as whether they are renaming, obtaining the test accuracy of 85.65% and the false positive rate of 2.03%. The methods in our experiment also have significance for general static analysis with neural network approaches.
AB - In software engineering, code review controls code quality and prevents bugs. Although many commits to a codebase add features, some commits are code refactoring, including renaming of identifiers. Reviewing code refactoring requires a bit of different efforts than that of reviewing functional changes. For instance, renaming an identifier has to make sure that the new name not only is more descriptive and follows the naming convention of the institution, but also does not collide with any other identifiers. We propose in this paper a machine learning model to automatically identify commits consisting of pure identifier renaming, from only the diff files. This technique helps code review enforce naming and coding conventions of the institution, and let quality assurance testers focus more on functional changes. In contrast to the traditional way of detecting such changes by parsing the full source code before and after the commit, which is less efficient and requires rigorous syntactical completeness and correctness, our novel approach based on neural networks is able to read only the diff and gives a confidence value of whether it is a renaming or not. Since there had been no existing labeled dataset on repository commits, we labeled a dataset with more than 1,000 repos from GitHub by Java syntax analysis. Then we trained a neural network to classify these commits as whether they are renaming, obtaining the test accuracy of 85.65% and the false positive rate of 2.03%. The methods in our experiment also have significance for general static analysis with neural network approaches.
KW - Code review
KW - Diff classification
KW - Neural networks
KW - Program analysis
KW - Supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85126441555&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-91608-4_4
DO - 10.1007/978-3-030-91608-4_4
M3 - Conference contribution
AN - SCOPUS:85126441555
SN - 9783030916077
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 33
EP - 44
BT - Intelligent Data Engineering and Automated Learning - 22nd International Conference, IDEAL 2021, Proceedings
A2 - Camacho, David
A2 - Tino, Peter
A2 - Allmendinger, Richard
A2 - Yin, Hujun
A2 - Tallón-Ballesteros, Antonio J.
A2 - Tang, Ke
A2 - Cho, Sung-Bae
A2 - Novais, Paulo
A2 - Nascimento, Susana
PB - Springer Science and Business Media Deutschland GmbH
T2 - 22nd International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2021
Y2 - 25 November 2021 through 27 November 2021
ER -