Risk-stratified classification of pulmonary nodule malignancy via a machine learning model integrating imaging and cell-free DNA: a model development and validation study (DECIPHER-NODL)
Summary: Background: Accurate risk stratification of pulmonary nodules is critical for early lung cancer detection. This study aimed to improve malignancy classification and invasiveness prediction using machine learning models integrating low-dose computed tomography (LDCT) radiomics and plasma cell-free DNA (cfDNA) fragmentomics. Methods: This multicenter study enrolled 1356 participants across discovery (n = 1147) and external validation (n = 209) cohorts. A deep learning-based imaging model processed LDCT scans for automated lung nodule detection and malignancy classification. A parallel cfDNA model analyzed four whole-genome fragmentation features: copy number variation, fragment size ratio, fragment-based methylation, and mutation context and signature. The two models were integrated via a stacked ensemble algorithm. An invasion prediction model evaluated tumor aggressiveness. Findings: The integrated imaging-cfDNA model outperformed individual models, with an AUC of 0.950 (95% CI: 0.926–0.975) in the internal test set and 0.966 (95% CI: 0.940–0.991) in the external validation. The combined model's specificity increased to 0.60 (95% CI: 0.49–0.71) while maintaining 95% sensitivity, compared to specificities of 0.50 (95% CI: 0.41–0.59) and 0.33 (95% CI: 0.23–0.44) at equivalent sensitivity levels for the imaging and cfDNA models, respectively. The combined model consistently outperformed the other two models across nodule characteristics, with particular improvement for 10–20 mm and pure solid nodules. The invasion prediction model stratified lung cancers with an AUC of 0.884 (internal) and 0.880 (external). Prediction scores increased stepwise with tumor aggressiveness, from adenocarcinoma in situ to minimally invasive adenocarcinoma, and were highest for invasive adenocarcinoma. Interpretation: This multimodal approach enhances pulmonary nodule risk stratification by integrating radiomic and molecular biomarkers. The model significantly improves diagnostic accuracy, potentially reducing unnecessary procedures while minimizing missed diagnoses, supporting its clinical utility in lung cancer screening. Funding: Noncommunicable Chronic Diseases-National Science and Technology Major Project, National Key Research & Development Programme, China National Science Foundation, the Science and Technology Planning Project of Guangzhou, and Guangzhou National Laboratory.