Evaluating Machine Learning Methods of Analyzing Multiclass Metabolomics

规范化（社会学）计算机科学人工智能机器学习代谢组学多类分类缺少数据数据库规范化数据挖掘插补（统计学）数据集模式识别（心理学）支持向量机生物信息学社会学生物人类学

作者

Yaguo Gong,Wei Ding,Panpan Wang,Qibiao Wu,Xiaojun Yao,Qingxia Yang

出处

期刊：Journal of Chemical Information and Modeling [American Chemical Society]
日期：2023-12-11 卷期号：63 (24): 7628-7641 被引量：12

链接

nih.govdoi.org

标识

DOI：10.1021/acs.jcim.3c01525

摘要

Multiclass metabolomic studies have become popular for revealing the differences in multiple stages of complex diseases, various lifestyles, or the effects of specific treatments. In multiclass metabolomics, there are multiple data manipulation steps for analyzing raw data, which consist of data filtering, the imputation of missing values, data normalization, marker identification, sample separation, classification, and so on. In each step, several to dozens of machine learning methods can be chosen for the given data set, with potentially hundreds or thousands of method combinations in the whole data processing chain. Therefore, a clear understanding of these machine learning methods is helpful for selecting an appropriate method combination for obtaining stable and reliable analytical results of specific data. However, there has rarely been an overall introduction or evaluation of these methods based on multiclass metabolomic data. Herein, detailed descriptions of these machine learning methods in multiple data manipulation steps are reviewed. Moreover, an assessment of these methods was performed using a benchmark data set for multiclass metabolomics. First, 12 imputation methods for imputing missing values were evaluated based on the PSS (Procrustes statistical shape analysis) and NRMSE (normalized root-mean-square error) values. Second, 17 normalization methods for processing multiclass metabolomic data were evaluated by applying the PMAD (pooled median absolute deviation) value. Third, different methods of identifying markers of multiclass metabolomics were evaluated based on the CWrel (relative weighted consistency) value. Fourth, nine classification methods for constructing multiclass models were assessed using the AUC (area under the curve) value. Performance evaluations of machine learning methods are highly recommended to select the most appropriate method combination before performing the final analysis of the given data. Overall, detailed descriptions and evaluation of various machine learning methods are expected to improve analyses of multiclass metabolomic data.

求助该文献

最长约 10秒，即可获得该文献文件

Evaluating Machine Learning Methods of Analyzing Multiclass Metabolomics

今日热心研友