可解释性
随机森林
人工智能
机器学习
梯度升压
代谢组学
计算机科学
线性判别分析
树(集合论)
二进制数
二元分类
Boosting(机器学习)
化学信息学
支持向量机
数学
生物信息学
生物
数学分析
算术
出处
期刊:PLOS ONE
[Public Library of Science]
日期:2023-05-04
卷期号:18 (5): e0284315-e0284315
被引量:48
标识
DOI:10.1371/journal.pone.0284315
摘要
Machine learning (ML) models are used in clinical metabolomics studies most notably for biomarker discoveries, to identify metabolites that discriminate between a case and control group. To improve understanding of the underlying biomedical problem and to bolster confidence in these discoveries, model interpretability is germane. In metabolomics, partial least square discriminant analysis (PLS-DA) and its variants are widely used, partly due to the model's interpretability with the Variable Influence in Projection (VIP) scores, a global interpretable method. Herein, Tree-based Shapley Additive explanations (SHAP), an interpretable ML method grounded in game theory, was used to explain ML models with local explanation properties. In this study, ML experiments (binary classification) were conducted for three published metabolomics datasets using PLS-DA, random forests, gradient boosting, and extreme gradient boosting (XGBoost). Using one of the datasets, PLS-DA model was explained using VIP scores, while one of the best-performing models, a random forest model, was interpreted using Tree SHAP. The results show that SHAP has a more explanation depth than PLS-DA's VIP, making it a powerful method for rationalizing machine learning predictions from metabolomics studies.
科研通智能强力驱动
Strongly Powered by AbleSci AI