A new sampling method for classifying imbalanced data based on support vector machine ensemble

机器学习 集成学习 模式识别(心理学) 分类器(UML) 随机森林 相关向量机 过采样 统计分类 特征选择 采样(信号处理) 极限学习机 多类分类
作者
Chuanxia Jian,Jian Gao,Yinhui Ao
出处
期刊:Neurocomputing [Elsevier]
卷期号:193 (193): 115-122 被引量:71
标识
DOI:10.1016/j.neucom.2016.02.006
摘要

The insufficient information from the minority examples cannot exactly represent the inherent structure of the dataset, which leads to a low prediction accuracy of the minority through the existing classification methods. The over- and under-sampling methods help to increase the prediction accuracy of the minority. However, the two methods either lose important information or add trivial information for classification, so as to affect the prediction accuracy of the minority. Therefore, a new different contribution sampling method (DCS) based on the contributions of the support vectors (SVs) and the nonsupport vectors (NSVs) to classification is proposed in this paper. The proposed DCS method applies different sampling methods for the SVs and the NSVs and uses the biased support vector machine (B-SVM) method to identify the SVs and the NSVs of an imbalanced data. Moreover, the synthetic minority over-sampling technique (SMOTE) and the random under-sampling technique (RUS) are used in the proposed method to re-sample the SVs in the minority and the NSVs in the majority, respectively. Examples are labeled by the ensemble of support vector machine (SVMen). Experiments are carried out on the imbalanced dataset which is selected from UCI, AVU06a, Statlog, DP01a, JP98a and CWH03a repositories. Experimental results show that for the imbalanced datasets, the proposed DCS method achieves a better performance in the aspects of Receiver Operating Characteristic (ROC) curve than other methods. The proposed DCS method improves 20.80%, 5.97%, 8.66% and 9.35% in terms of the geometric mean prediction accuracy G m e a n as compared with that achieved by using the NS, the US, the SMOTE and the ROS, respectively.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
活力菠萝发布了新的文献求助10
刚刚
雷朝唐完成签到,获得积分10
1秒前
天才小能喵应助熊熊熊采纳,获得10
2秒前
3秒前
只昂张发布了新的文献求助20
4秒前
5秒前
呼呼夫人完成签到 ,获得积分10
5秒前
5秒前
天才小能喵应助呆萌香菇采纳,获得10
6秒前
rocky15应助dada采纳,获得30
6秒前
8秒前
flow发布了新的文献求助10
10秒前
huy发布了新的文献求助10
10秒前
镜花水月发布了新的文献求助10
10秒前
彼得大帝完成签到,获得积分20
11秒前
123发布了新的文献求助10
11秒前
杨晗庆发布了新的文献求助10
11秒前
活力菠萝完成签到,获得积分10
12秒前
12秒前
15秒前
16秒前
shinysparrow应助interest-li采纳,获得10
16秒前
17秒前
19秒前
shinysparrow应助士晋采纳,获得20
20秒前
pj发布了新的文献求助10
20秒前
21秒前
livy发布了新的文献求助20
21秒前
桐桐应助仵一采纳,获得10
21秒前
23秒前
彼得大帝发布了新的文献求助10
23秒前
所所应助杨晗庆采纳,获得10
24秒前
似乎一场梦完成签到,获得积分10
25秒前
26秒前
sy发布了新的文献求助10
27秒前
27秒前
27秒前
28秒前
28秒前
28秒前
高分求助中
Sustainable Land Management: Strategies to Cope with the Marginalisation of Agriculture 1000
Corrosion and Oxygen Control 600
Yaws' Handbook of Antoine coefficients for vapor pressure 500
Python Programming for Linguistics and Digital Humanities: Applications for Text-Focused Fields 500
行動データの計算論モデリング 強化学習モデルを例として 500
Johann Gottlieb Fichte: Die späten wissenschaftlichen Vorlesungen / IV,1: ›Transzendentale Logik I (1812)‹ 400
The role of families in providing long term care to the frail and chronically ill elderly living in the community 380
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2554867
求助须知:如何正确求助?哪些是违规求助? 2179402
关于积分的说明 5619138
捐赠科研通 1900558
什么是DOI,文献DOI怎么找? 949184
版权声明 565573
科研通“疑难数据库(出版商)”最低求助积分说明 504615