人工智能
分类器(UML)
瓶颈
自编码
代谢组学
模式识别(心理学)
化学
深度学习
机器学习
分析物
软件
计算机科学
色谱法
嵌入式系统
程序设计语言
作者
Ethan Stancliffe,Gary J. Patti
标识
DOI:10.1021/acs.analchem.3c00764
摘要
Peak-detection algorithms currently used to process untargeted metabolomics data were designed to maximize sensitivity at the sacrifice of selectively. Peak lists returned by conventional software tools therefore contain a high density of artifacts that do not represent real chemical analytes, which, in turn, hinder downstream analyses. Although some innovative approaches to remove artifacts have recently been introduced, they involve extensive user intervention due to the diversity of peak shapes present within and across metabolomics data sets. To address this bottleneck in metabolomics data processing, we developed a semisupervised deep learning-based approach, PeakDetective, for classification of detected peaks as artifacts or true peaks. Our approach utilizes two techniques for artifact removal. First, an unsupervised autoencoder is used to extract a low-dimensional, latent representation of each peak. Second, a classifier is trained with active learning to discriminate between artifacts and true peaks. Through active learning, the classifier is trained with less than 100 user-labeled peaks in a matter of minutes. Given the speed of its training, PeakDetective can be rapidly tailored to specific LC/MS methods and sample types to maximize performance on each type of data set. In addition to curation, the trained models can also be utilized for peak detection to immediately detect peaks with both high sensitivity and selectivity. We validated PeakDetective on five diverse LC/MS data sets, where PeakDetective showed greater accuracy compared to current approaches. When applied to a SARS-CoV-2 data set, PeakDetective enabled more statistically significant metabolites to be detected. PeakDetective is open source and available as a Python package at https://github.com/pattilab/PeakDetective.
科研通智能强力驱动
Strongly Powered by AbleSci AI