转移
分类器(UML)
计算机科学
人工智能
集成学习
机器学习
转录组
生物标志物发现
交叉验证
生物标志物
计算生物学
基因
生物信息学
癌症
基因表达
生物
蛋白质组学
医学
内科学
生物化学
作者
Jian Ruan,Shuaishuai Xu,Ruyin Chen,Wenxin Qu,Qiong Li,Chanqi Ye,Wei Wu,Qi Jiang,Feifei Yan,Enhui Shen,Qinjie Chu,Yunlu Jia,Xiaochen Zhang,Wenguang Fu,Jinzhang Chen,Michael P. Timko,Peng Zhao,Longjiang Fan,Yifei Shen
摘要
Robust strategies to identify patients at high risk for tumor metastasis, such as those frequently observed in intrahepatic cholangiocarcinoma (ICC), remain limited. While gene/protein expression profiling holds great potential as an approach to cancer diagnosis and prognosis, previously developed protocols using multiple diagnostic signatures for expression-based metastasis prediction have not been widely applied successfully because batch effects and different data types greatly decreased the predictive performance of gene/protein expression profile-based signatures in interlaboratory and data type dependent validation. To address this problem and assist in more precise diagnosis, we performed a genome-wide integrative proteome and transcriptome analysis and developed an ensemble machine learning-based integration algorithm for metastasis prediction (EMLI-Metastasis) and risk stratification (EMLI-Prognosis) in ICC. Based on massive proteome (216) and transcriptome (244) data sets, 132 feature (biomarker) genes were selected and used to train the EMLI-Metastasis algorithm. To accurately detect the metastasis of ICC patients, we developed a weighted ensemble machine learning method based on k-Top Scoring Pairs (k-TSP) method. This approach generates a metastasis classifier for each bootstrap aggregating training data set. Ten binary expression rank-based classifiers were generated for detection of metastasis separately. To further improve the accuracy of the method, the 10 binary metastasis classifiers were combined by weighted voting based on the score from the prediction results of each classifier. The prediction accuracy of the EMLI-Metastasis algorithm achieved 97.1% and 85.0% in proteome and transcriptome datasets, respectively. Among the 132 feature genes, 21 gene-pair signatures were developed to establish a metastasis-related prognosis risk-stratification model in ICC (EMLI-Prognosis). Based on EMLI-Prognosis algorithm, patients in the high-risk group had significantly dismal overall survival relative to the low-risk group in the clinical cohort (P-value < 0.05). Taken together, the EMLI-ICC algorithm provides a powerful and robust means for accurate metastasis prediction and risk stratification across proteome and transcriptome data types that is superior to currently used clinicopathological features in patients with ICC. Our developed algorithm could have profound implications not just in improved clinical care in cancer metastasis risk prediction, but also more broadly in machine-learning-based multi-cohort diagnosis method development. To make the EMLI-ICC algorithm easily accessible for clinical application, we established a web-based server for metastasis risk prediction (http://ibi.zju.edu.cn/EMLI/).
科研通智能强力驱动
Strongly Powered by AbleSci AI