过采样
子空间拓扑
MNIST数据库
班级(哲学)
人工智能
计算机科学
模式识别(心理学)
采样(信号处理)
机器学习
集合(抽象数据类型)
数学
计算机视觉
深度学习
计算机网络
滤波器(信号处理)
程序设计语言
带宽(计算)
作者
Tianjun Li,Yingxu Wang,Licheng Liu,Long Chen,Chih-Yao Chen
标识
DOI:10.1016/j.ins.2022.11.108
摘要
In pattern classification, the class imbalance problem always occurs when the number of observations in some classes is significantly different from that of other categories, which leads to the learning bias in the classifiers. One possible solution to this problem is to re-balance the training set by over-sampling the minority class. However, over-samplings always push the classification boundaries to the majority part, thus the recall increases while the precision decreases. To avoid this situation and better handle the class imbalance problem, this paper proposes a new over-sampling method, namely Subspace-based Minority Over-Sampling (abbr. SMO). This approach considers that each category of samples is formed by common and unique characteristics, and such characteristics can be extracted by subspace. To obtain the balanced data, the common part is over-sampled for more accurately depicting the minority, and the unique part can be expanded by some generative methods. The balanced data are obtained by restoring the generated products of the subspace to the original space. The experimental results demonstrate that the SMO has the ability to model complex data distributions and outperforms both classical and newly designed over-sampling algorithms. Also, SMO can be used to generate simple images, and the generation results of MNIST can be clearly identified by both human vision and machine vision.
科研通智能强力驱动
Strongly Powered by AbleSci AI