计算机科学
过采样
数据挖掘
班级(哲学)
机器学习
人工智能
钥匙(锁)
计算机网络
计算机安全
带宽(计算)
作者
Nur Athirah Azhar,Muhammad Syafiq Mohd Pozi,Aniza Mohamed Din,Adam Jatowt
标识
DOI:10.1109/tkde.2022.3179381
摘要
Many binary class datasets in real-life applications are affected by class imbalance problem. Data complexities like noise examples, class overlap and small disjuncts problems are observed to play a key role in producing poor classification performance. These complexities tend to exist in tandem with class imbalance problem. Synthetic Minority Oversampling Technique (SMOTE) is a well-known method to re-balance the number of examples in imbalanced datasets. However, this technique cannot effectively tackle data complexities and it also has the capability of magnifying the degree of complexities. Also, the performance of the SMOTE is still not satisfactory. Therefore, various SMOTE variants have been proposed to overcome the downsides of SMOTE either by combining SMOTE with other algorithms or modifying the existing SMOTE algorithm. This paper aims to comparatively review the algorithms applied in SMOTE variants and investigate which data complexities are being addressed in what variants. Series of experiments are conducted on 24 binary class imbalanced datasets to observe the changes in the data complexity measures after SMOTE variants were applied in these datasets. The evaluation metrics like G-Mean and F1-Score are also analyzed to investigate the difference in classification performance between SMOTE variants.
科研通智能强力驱动
Strongly Powered by AbleSci AI