机器学习
集成学习
模式识别(心理学)
分类器(UML)
随机森林
相关向量机
过采样
统计分类
特征选择
采样(信号处理)
极限学习机
多类分类
作者
Chuanxia Jian,Jian Gao,Yinhui Ao
出处
期刊:Neurocomputing
[Elsevier]
日期:2016-06-12
卷期号:193 (193): 115-122
被引量:71
标识
DOI:10.1016/j.neucom.2016.02.006
摘要
The insufficient information from the minority examples cannot exactly represent the inherent structure of the dataset, which leads to a low prediction accuracy of the minority through the existing classification methods. The over- and under-sampling methods help to increase the prediction accuracy of the minority. However, the two methods either lose important information or add trivial information for classification, so as to affect the prediction accuracy of the minority. Therefore, a new different contribution sampling method (DCS) based on the contributions of the support vectors (SVs) and the nonsupport vectors (NSVs) to classification is proposed in this paper. The proposed DCS method applies different sampling methods for the SVs and the NSVs and uses the biased support vector machine (B-SVM) method to identify the SVs and the NSVs of an imbalanced data. Moreover, the synthetic minority over-sampling technique (SMOTE) and the random under-sampling technique (RUS) are used in the proposed method to re-sample the SVs in the minority and the NSVs in the majority, respectively. Examples are labeled by the ensemble of support vector machine (SVMen). Experiments are carried out on the imbalanced dataset which is selected from UCI, AVU06a, Statlog, DP01a, JP98a and CWH03a repositories. Experimental results show that for the imbalanced datasets, the proposed DCS method achieves a better performance in the aspects of Receiver Operating Characteristic (ROC) curve than other methods. The proposed DCS method improves 20.80%, 5.97%, 8.66% and 9.35% in terms of the geometric mean prediction accuracy G m e a n as compared with that achieved by using the NS, the US, the SMOTE and the ROS, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI