An Empirical Study on the Relationship Between Defects and Source Code’s Unnaturalness

计算机科学源代码实证研究程序设计语言数学统计

作者

Yanjie Jiang,Hui Liu,J. Liu,Yuxia Zhang,Weixing Ji,Hao Zhong,Lu Zhang

出处

期刊：ACM Transactions on Software Engineering and Methodology [Association for Computing Machinery]
日期：2025-02-18

标识

摘要

Natural languages are “natural” in that texts in natural languages are repetitive and predictable. Recent research indicates that programming languages share similar characteristics (naturalness), with source code displaying patterns of repetition and predictability. Notably, studies have shown that buggy code deviates from these natural patterns in that buggy code is significantly less natural than bug-free one. In this paper, we conduct a large-scale and extensive empirical study to investigate whether code defects lead to unnaturalness of source code. Different from existing studies, we leverage multiple large-scale and high-quality bug repositories where bug-irrelevant changes in bug-fixing commits have been explicitly excluded. The leveraged software applications cover different programming languages, and the empirical study involves real-world software defects as well as defects injected automatically with well-known mutation operators. On one side, our evaluation results confirm existing studies in that buggy source code lines are often less natural than bug-free ones. On the other side, our evaluation reveals some interesting new findings. First, fixing bugs does not significantly improve the naturalness of code lines and the fixed lines on average are as unnatural as buggy ones. This finding may suggest that software defects are not the root causes of source code’s unnaturalness although there does existing statistically significant correlation between software defects and source code’s naturalness. Second, defects in different programming languages have similar effect on source code’s naturalness. The conclusions (i.e., buggy code is less natural but fixing the bugs cannot improve source code’s naturalness) hold regardless of the programming languages. Third, injecting defects automatically by well-known mutation operators does not significantly reduce the naturalness of involved source code lines. This suggests that automatically injected defects may have a similar impact on the naturalness of source code as real-world defects inadvertently introduced by developers. Fourth, the detects’ impact on source code’s naturalness varies slightly among different categories of software defects. Although fixing bugs on average does not significantly improve the naturalness of involved source code, fixing ”checking” related bugs does significantly improve the naturalness of source code. Finally, locating buggy code lines according to naturalness alone is inaccurate, resulting in extremely low precision (less than one percent).

求助该文献

最长约 10秒，即可获得该文献文件

An Empirical Study on the Relationship Between Defects and Source Code’s Unnaturalness

今日热心研友