计算机科学
背景(考古学)
班级(哲学)
机器学习
口译(哲学)
人工智能
数据挖掘
质量(理念)
预测建模
哲学
程序设计语言
古生物学
生物
认识论
作者
Chakkrit Tantithamthavorn,Ahmed E. Hassan,Kenichi Matsumoto
标识
DOI:10.1109/tse.2018.2876537
摘要
Defect models that are trained on class imbalanced datasets (i.e., the proportion of defective and clean modules is not equally represented) are highly susceptible to produce inaccurate prediction models. Prior research compares the impact of class rebalancing techniques on the performance of defect models but arrives at contradictory conclusions due to the use of different choice of datasets, classification techniques, and performance measures. Such contradictory conclusions make it hard to derive practical guidelines for whether class rebalancing techniques should be applied in the context of defect models. In this paper, we investigate the impact of class rebalancing techniques on the performance measures and interpretation of defect models. We also investigate the experimental settings in which class rebalancing techniques are beneficial for defect models. Through a case study of 101 datasets that span across proprietary and open-source systems, we conclude that the impact of class rebalancing techniques on the performance of defect prediction models depends on the used performance measure and the used classification techniques. We observe that the optimized SMOTE technique and the under-sampling technique are beneficial when quality assurance teams wish to increase AUC and Recall, respectively, but they should be avoided when deriving knowledge and understandings from defect models.
科研通智能强力驱动
Strongly Powered by AbleSci AI