采样(信号处理)
支持向量机
样品(材料)
主成分分析
模式识别(心理学)
计算机科学
统计
样本量测定
人工智能
数学
数据挖掘
化学
滤波器(信号处理)
色谱法
计算机视觉
作者
Haitao Song,Hongyong Leng,Zhuoya Hou,Rui Gao,Cheng Chen,Chunzhi Meng,Jinshan Sun,Chenxi Li,Binlin Ma
标识
DOI:10.1016/j.pdpdt.2022.103059
摘要
Due to limitations in disease prevalence and hospital specificity, spectral data are often collected with unbalanced sample size. To solve this problem, a new sampling method - grouped-sampling was proposed in this research, which is shown to be effective for unbalanced data. It avoids over-fitting of over-sampling and overcomes under-sampling utilization of under-sampling. In this study, we applied grouped-sampling to two unbalanced datasets where the sample proportions are 199:40 and 75:225. And then verified from two classic models: PCA-SVM (Principal Component Analysis-Support Vector Machine) and the deep learning algorithm GoogLeNet. The accuracy of these two datasets were 85.11% and 96.15% in PCA-SVM and 85.10% and 84.61% on GoogLeNet. Also, the F1-score were evaluated to measure the classification balance of sampling method, and result shows that F1-score of grouped-sampling is always the highest compared to over-sampling and under-sampling. In summary, compared to traditional sampling methods, grouped-sampling performs better on prediction for classes with smaller sample size, which means grouped-sampling can improve the balance of classification results and the potential of practical application. Therefore, we develop a group sampling method that distinguishes between under- and over-sampling, which greatly improves the accuracy and balance of predictions for unbalanced samples.
科研通智能强力驱动
Strongly Powered by AbleSci AI