RSS
过采样
简单随机抽样
引导聚合
计算机科学
随机森林
采样(信号处理)
人工智能
集成学习
机器学习
支持向量机
水准点(测量)
一般化
数据挖掘
样本量测定
统计
数学
数学分析
操作系统
社会学
人口学
滤波器(信号处理)
计算机视觉
计算机网络
大地测量学
地理
带宽(计算)
人口
作者
Jieting Wang,Feijiang Li,Jue Li,Chenping Hou,Yuhua Qian,Jiye Liang
标识
DOI:10.1109/tnnls.2023.3270559
摘要
The bagging method has received much application and attention in recent years due to its good performance and simple framework. It has facilitated the advanced random forest method and accuracy-diversity ensemble theory. Bagging is an ensemble method based on simple random sampling (SRS) method with replacement. However, SRS is the most foundation sampling method in the field of statistics, where exists some other advanced sampling methods for probability density estimation. In imbalanced ensemble learning, down-sampling, over-sampling, and SMOTE methods have been proposed for generating base training set. However, these methods aim at changing the underlying distribution of data rather than simulating it better. The ranked set sampling (RSS) method uses auxiliary information to get more effective samples. The purpose of this article is to propose a bagging ensemble method based on RSS, which uses the ordering of objects related to the class to obtain more effective training sets. To explain its performance, we give a generalization bound of ensemble from the perspective of posterior probability estimation and Fisher information. On the basis of RSS sample having a higher Fisher information than SRS sample, the presented bound theoretically explains the better performance of RSS-Bagging. The experiments on 12 benchmark datasets demonstrate that RSS-Bagging statistically performs better than SRS-Bagging when the base classifiers are multinomial logistic regression (MLR) and support vector machine (SVM).
科研通智能强力驱动
Strongly Powered by AbleSci AI