预处理器
人工智能
线性判别分析
模式识别(心理学)
数据预处理
计算机科学
偏最小二乘回归
极限学习机
支持向量机
数学
生物系统
机器学习
人工神经网络
生物
作者
Hong Liu,Honggao Liu,Jieqing Li,Yuanzhong Wang
摘要
ABSTRACT Gastrodia elata Blume from different regions varies in growth conditions, soil types, and climate, which directly affects the content and quality of its medicinal components. Accurately identifying the origin can effectively ensure the medicinal value of G. elata Bl., prevent the circulation of counterfeit products, and thus protect the interests and health of consumers. Attenuated total reflectance Fourier transform infrared (ATR‐FTIR) spectroscopy is a rapid and effective method for verifying the authenticity of traditional Chinese medicines. However, the presence of scattering effects in the spectra poses challenges in establishing reliable discrimination models. Therefore, employing appropriate scattering correction techniques is crucial for improving the quality of spectral data and the accuracy of discrimination models. This study uses two ensemble preprocessing approaches; the first type is series fusion of scatter correction technologies (SCSF), and another method is sequential preprocessing through orthogonalization (SPORT). Four discriminant models were established using a single scattering correction technique and two ensemble preprocessing approaches. The results show that the data‐driven version of the soft independent modeling of class analogy (DD‐SIMCA) model built based on multiplicative scatter correction (MSC) preprocessing has a sensitivity of 0.98 and a specificity of 0.91, able to effectively distinguish whether a sample of G. elata Bl. originates from Zhaotong. In addition, three discriminant models including support vector machine (SVM), partial least squares discriminant analysis (PLS‐DA), and three gradient boosting machine (GBM) algorithms built using the ensemble preprocessing approach have good classification and generalization capabilities. Among them, the SCSF‐PLS‐DA model has the best performance with 99.68% and 98.08% accuracy for the training and test sets, respectively, and F1 of 0.97; the SPORT‐SVM model achieved the second‐best classification ability. The results show that the ensemble preprocessing approach used can improve the success rate of G. elata Bl. geographical origin classification.
科研通智能强力驱动
Strongly Powered by AbleSci AI