计算机科学
机器学习
人工智能
班级(哲学)
集成学习
噪音(视频)
监督学习
模式识别(心理学)
人工神经网络
图像(数学)
作者
Rashida Hasan,Chee‐Hung Henry Chu
标识
DOI:10.1145/3605098.3635936
摘要
The goal of machine learning is to approximate an unknown input function by learning based on a set of labeled training samples. Noisy labels due to class noise in the training data can have three negative consequences: (i) the prediction accuracy may decrease, (ii) the complexity of the model may increase, and (iii) the number of training examples needed may increase. To tackle this problem, we present a new ensemble-based filtering approach for identifying and eliminating class noise. In our approach, we build the ensemble filter by employing k-means clustering and classifier calibration. By using a high agreement rate, our heterogeneous ensemble filter is able to collect most of the clean data. We report experiments on eight binary and five multiclass datasets from UCI benchmarks to demonstrate our proposed methods are highly effective in label noise filtering. Experimental results show that our proposed method led to significant performance improvement compared with the state-of-the-art baselines. A comparative analysis is conducted with respect to the two-stage ensemble filter, a reference homogeneous ensemble-based class noise filter, and mCRF, a reference multiclass label noise filter.
科研通智能强力驱动
Strongly Powered by AbleSci AI