计算机科学
源代码
实证研究
程序设计语言
数学
统计
作者
Yanjie Jiang,Hui Liu,J. Liu,Yuxia Zhang,Weixing Ji,Hao Zhong,Lu Zhang
摘要
Natural languages are “natural” in that texts in natural languages are repetitive and predictable. Recent research indicates that programming languages share similar characteristics (naturalness), with source code displaying patterns of repetition and predictability. Notably, studies have shown that buggy code deviates from these natural patterns in that buggy code is significantly less natural than bug-free one. In this paper, we conduct a large-scale and extensive empirical study to investigate whether code defects lead to unnaturalness of source code. Different from existing studies, we leverage multiple large-scale and high-quality bug repositories where bug-irrelevant changes in bug-fixing commits have been explicitly excluded. The leveraged software applications cover different programming languages, and the empirical study involves real-world software defects as well as defects injected automatically with well-known mutation operators. On one side, our evaluation results confirm existing studies in that buggy source code lines are often less natural than bug-free ones. On the other side, our evaluation reveals some interesting new findings. First, fixing bugs does not significantly improve the naturalness of code lines and the fixed lines on average are as unnatural as buggy ones. This finding may suggest that software defects are not the root causes of source code’s unnaturalness although there does existing statistically significant correlation between software defects and source code’s naturalness. Second, defects in different programming languages have similar effect on source code’s naturalness. The conclusions (i.e., buggy code is less natural but fixing the bugs cannot improve source code’s naturalness) hold regardless of the programming languages. Third, injecting defects automatically by well-known mutation operators does not significantly reduce the naturalness of involved source code lines. This suggests that automatically injected defects may have a similar impact on the naturalness of source code as real-world defects inadvertently introduced by developers. Fourth, the detects’ impact on source code’s naturalness varies slightly among different categories of software defects. Although fixing bugs on average does not significantly improve the naturalness of involved source code, fixing ”checking” related bugs does significantly improve the naturalness of source code. Finally, locating buggy code lines according to naturalness alone is inaccurate, resulting in extremely low precision (less than one percent).
科研通智能强力驱动
Strongly Powered by AbleSci AI