欠采样
过采样
计算机科学
机器学习
随机森林
人工智能
稳健性(进化)
采样(信号处理)
支持向量机
数据采样
数据挖掘
基因
滤波器(信号处理)
生物化学
化学
计算机视觉
带宽(计算)
计算机网络
作者
Qingyong Wang,Yun Zhou,Weiming Zhang,TANG Zhangui,Xiaojing Chen
标识
DOI:10.1016/j.eswa.2020.113334
摘要
The early diagnosis of cancer diseases is an indispensable part in the cancer research. It urges people to develop many new machine learning approaches to assist the diseases identification based on the gene expression data. However, the race occurrence of malignant tumors creates a challenge due to the potential over-fitting risk in the current model training. Typically, people use various sampling methods (e.g., random oversampling and undersampling) to address this challenge to provide a balanced data distribution. However, these methods might discard potentially useful samples. In this paper, we proposed an imbalanced sampling approach via self-paced learning (ISPL) to effectively select high-quality samples to improve the robustness. The experimental results showed that our proposed ISPL method increased the classification accuracy by approximately 16% compared with the average performance obtained by other sampling methods. In addition, the new method successfully selected some important genes for further investigation.
科研通智能强力驱动
Strongly Powered by AbleSci AI