作者
Fubo Wang,Chengbang Wang,Shaohua Chen,Chunmeng Wei,Jin Ji,Yan Liu,Leifeng Liang,Yifeng Chen,Xing Li,Lin Zhao,Xiaolei Shi,Fang Yu,Weimin Lu,Tianman Li,Zhe Liu,Wenhao Lu,Tingting Li,Xiangui Hu,Meimei Li,Fuchen Liu
摘要
Cancer remains a leading global cause of mortality, making early detection crucial for improving survival outcomes. The study aims to develop a machine learning-enabled blood-derived exosomal RNA profiling platform for multi-cancer detection and localization. In this multi-phase, multi-center study, we analyzed RNA from exosomes derived from peripheral blood plasma in 818 participants across eight cancer types during the discovery phase. Machine learning techniques were applied to identify potential pan-cancer biomarkers. During the screening and model validation phases, the sample size was progressively expanded to 1,385 participants in two steps, while the candidate biomarkers were refined into a set of 12 exosomal tumor RNA signatures (ETR.sig). In the subsequent model construction phase, diagnostic models were developed using the expanded cohort and ETR.sig. Statistical analyses included the calculation of receiver operating characteristic (ROC) curves and AUC values to assess the models' ability to distinguish cancer cases from controls and determine tumor origins. To further validate and explore the biological relevance of the identified biomarkers, we integrated tissue RNA-seq, single-cell data, and clinical information. Machine learning analysis initially identified 33 candidate biomarkers, which were narrowed down to 20 ETR.sig in the screening phase and 12 ETR.sig in the validation phase. In the model construction phase, a diagnostic model based on ETR.sig, built using the Random Forest (RF) algorithm, showed excellent performance with an AUC of 0.915 for distinguishing pan-cancer from controls. The multi-class classification model also demonstrated strong classification power, with macro-average and micro-average AUCs of 0.983 and 0.985, respectively, for differentiating between eight cancer types. Additionally, tumor origin classification using the RF-based diagnostic models achieved high AUC values: BRCA 0.976, COAD 0.98, KIRC 0.947, LIHC 0.967, LUAD 0.853, OV 0.972, PAAD 0.977, and PRAD 0.898. Integration of tissue RNA-seq, single-cell data, and clinical information revealed key associations between ETR.sig-related genes and tumor development. The study demonstrates the robust potential of exosomal RNA as a minimally invasive biomarker resource for cancer detection. The developed ETR.sig platform offers a promising tool for precision oncology and broad-spectrum cancer screening, integrating advanced computational models with nanoscale vesicle biology for accurate and rapid diagnosis.