过采样
模式识别(心理学)
人工智能
灵敏度(控制系统)
算法
高斯分布
计算机科学
滤波器(信号处理)
边界(拓扑)
机器学习
数学
物理
工程类
带宽(计算)
量子力学
计算机网络
电子工程
计算机视觉
数学分析
作者
Zhaozhao Xu,Derong Shen,Yue Kou,Tiezheng Nie
标识
DOI:10.1109/tnnls.2022.3197156
摘要
Data imbalance is a common phenomenon in machine learning. In the imbalanced data classification, minority samples are far less than majority samples, which makes it difficult for minority to be effectively learned by classifiers. A synthetic minority oversampling technique (SMOTE) improves the sensitivity of classifiers to minority by synthesizing minority samples without repetition. However, the process of synthesizing new samples in the SMOTE algorithm may lead to problems such as "noisy samples" and "boundary samples." Based on the above description, we propose a synthetic minority oversampling technique based on Gaussian mixture model filtering (GMF-SMOTE). GMF-SMOTE uses the expected maximum algorithm based on the Gaussian mixture model to group the imbalanced data. Then, the expected maximum filtering algorithm is used to filter out the "noisy samples" and "boundary samples" in the subclasses after grouping. Finally, to synthesize majority and minority samples, we design two dynamic oversampling ratios. Experimental results show that the GMF-SMOTE performs better than the traditional oversampling algorithms on 20 UCI datasets. The population averages of sensitivity and specificity indexes of random forest (RF) on the UCI datasets synthesized by GMF-SMOTE are 97.49% and 97.02%, respectively. In addition, we also record the G-mean and MCC indexes of the RF, which are 97.32% and 94.80%, respectively, significantly better than the traditional oversampling algorithms. More importantly, the two statistical tests show that GMF-SMOTE is significantly better than the traditional oversampling algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI