Python(编程语言)
计算机科学
机器学习
人工智能
可维护性
可测试性
特征选择
单元测试
回归检验
软件
工具箱
软件质量
程序设计语言
软件系统
软件开发
软件工程
软件建设
哲学
认识论
作者
Natthida Vatanapakorn,Chitsutha Soomlek,Pusadee Seresangtakul
标识
DOI:10.1109/icsec56337.2022.10049330
摘要
Python is an increasingly popular programming language used in various software projects and domains. Code smells in Python significantly influences the maintainability, understandability, testability issues. This paper proposes a machine learning-based code smell detection for Python programs. We trained eight machine learning models with a dataset based on 115 open-source Python projects, 39 class-level software metrics, and 22 function-level software metrics. We intended to identify five code smell types in both class and function levels, i.e., long method, long parameter list, large class long scope chaining, and long based class list. Correlation-based feature selection (CFS) and logistic regression-forward stepwise (conditional) selection were employed to improve the performance of the model. This research concluded with an empirical evaluation of the performance of the machine learning approaches against the tuning machine method. The results show that the machine learning method achieved 99.72% accuracy when identifying long method and long base class list. The machine learning-based code smell detection also outperformed the tuning machine method. Moreover, we also found a set of high-impact features that contributed most when identifying each type of code smell.
科研通智能强力驱动
Strongly Powered by AbleSci AI