折叠(高阶函数)
排列(音乐)
排
代谢组学
OPL公司
交叉验证
计算机科学
人工智能
计算生物学
色谱法
生物
计算化学
化学
分子动力学
数据库
水模型
物理
声学
程序设计语言
作者
Mohamed N. Triba,Laurence Le Moyec,Roland Amathieu,Corentine Goossens,Nadia Bouchemal,Pierre Nahon,Douglas N. Rutledge,Philippe Savarin
摘要
Among all the software packages available for discriminant analyses based on projection to latent structures (PLS-DA) or orthogonal projection to latent structures (OPLS-DA), SIMCA (Umetrics, Umeå Sweden) is the more widely used in the metabolomics field. SIMCA proposes many parameters or tests to assess the quality of the computed model (the number of significant components, R2, Q2, pCV-ANOVA, and the permutation test). Significance thresholds for these parameters are strongly application-dependent. Concerning the Q2 parameter, a significance threshold of 0.5 is generally admitted. However, during the last few years, many PLS-DA/OPLS-DA models built using SIMCA have been published with Q2 values lower than 0.5. The purpose of this opinion note is to point out that, in some circumstances frequently encountered in metabolomics, the values of these parameters strongly depend on the individuals that constitute the validation subsets. As a result of the way in which the software selects members of the calibration and validation subsets, a simple permutation of dataset rows can, in several cases, lead to contradictory conclusions about the significance of the models when a K-fold cross-validation is used. We believe that, when Q2 values lower than 0.5 are obtained, SIMCA users should at least verify that the quality parameters are stable towards permutation of the rows in their dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI