支持向量机
计算机科学
机器学习
人工智能
采样(信号处理)
集成学习
班级(哲学)
数据挖掘
背景(考古学)
过采样
模式识别(心理学)
滤波器(信号处理)
计算机视觉
古生物学
计算机网络
带宽(计算)
生物
作者
Liu Yang,Xiaohui Yu,Jimmy Xiangji Huang,Aijun An
标识
DOI:10.1016/j.ipm.2010.11.007
摘要
Learning from imbalanced datasets is difficult. The insufficient information that is associated with the minority class impedes making a clear understanding of the inherent structure of the dataset. Most existing classification methods tend not to perform well on minority class examples when the dataset is extremely imbalanced, because they aim to optimize the overall accuracy without considering the relative distribution of each class. In this paper, we study the performance of SVMs, which have gained great success in many real applications, in the imbalanced data context. Through empirical analysis, we show that SVMs may suffer from biased decision boundaries, and that their prediction performance drops dramatically when the data is highly skewed. We propose to combine an integrated sampling technique, which incorporates both over-sampling and under-sampling, with an ensemble of SVMs to improve the prediction performance. Extensive experiments show that our method outperforms individual SVMs as well as several other state-of-the-art classifiers.
科研通智能强力驱动
Strongly Powered by AbleSci AI