分类器(UML)
人工智能
接收机工作特性
朴素贝叶斯分类器
数学
模式识别(心理学)
凸壳
班级(哲学)
机器学习
计算机科学
先验概率
灵敏度(控制系统)
训练集
贝叶斯分类器
切断
凸组合
随机子空间法
支持向量机
统计分类
统计
作者
Chawla, N. V.,Bowyer K W,Hall L O,Kegelmeyer, W. P.
出处
期刊:Cornell University - arXiv
日期:2011-06-09
被引量:59
标识
DOI:10.48550/arxiv.1106.1813
摘要
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
科研通智能强力驱动
Strongly Powered by AbleSci AI