过采样
噪音(视频)
计算机科学
树(集合论)
人工智能
随机森林
班级(哲学)
模式识别(心理学)
合成数据
数据挖掘
机器学习
数学
带宽(计算)
图像(数学)
计算机网络
数学分析
标识
DOI:10.1142/s0218001422590388
摘要
The classification is usually degraded due to the imbalanced class distribution. Synthetic minority oversampling technique (SMOTE) has been successful in improving imbalanced classification and has received great praise. Overgeneralization is one of the most challenges in SMOTE. Although multiple SMOTE-based variations are proposed against overgeneralization, they still have the following shortcomings: (a) creating too many synthetic samples in high-density regions; (b) removing suspicious noise directly instead of modifying them; (c) relying on many parameters. This paper proposes a new SMOTE based on adaptive noise optimization and fast search for local sets (SMOTE-ANO-FLS) to overcome the overgeneralization and the shortcomings of existing works. First, SMOTE-ANO-FLS uses the [Formula: see text]-D tree to fast search the local sets for each sample. Second, a new noise detection method based on local sets and the imbalanced ratio is proposed to detect suspicious noise. Third, a new adaptive noise optimization method is proposed to modify detected suspicious noise instead of removing them. Finally, a new probability weight based on local sets is proposed to help create more synthetic minority class samples in borderline and sparse regions. The effectiveness of SMOTE-ANO-FLS is proven by employing 7 oversampling methods and random forest on the extensive synthetic and real data sets.
科研通智能强力驱动
Strongly Powered by AbleSci AI