作者
Ya Li,Xia Ji,Tony He,Yong‐Jie Hu,Daobin Zhou,Dan Zou,Benlan Li,Min Zhang,Zhongjun Huang,M. Zhang,Xuzhen Liu,Minfang Wang,Hongyan Luo,Fangyang Lu,Chuan Zhang,Xingxing Zhao,Shengfa Su,Jie Peng
摘要
Background This study aimed to develop and validate an interpretable machine learning model that harnesses circulating tumor DNA (ctDNA) to predict progression-free survival (PFS) in patients with non-small cell lung cancer (NSCLC) undergoing immunotherapy, thereby addressing the inherent limitations of conventional biomarkers such as PD-L1 expression and tumor mutational burden. Methods This multicenter study involved pretreatment ctDNA profiling of 441 patients with non-small cell lung cancer (NSCLC), stratified into three independent cohorts: a training set (n=303, OAK trial), a validation set (n=97, POPLAR trial), and a local test set (n=41, multicenter retrospective cohort, 2023–2024). Using 5-fold cross-validated LASSO-Cox (Least Absolute Shrinkage and Selection Operator-Cox Proportional Hazards) regression, 25 prognostic genomic features were identified for integration into an eXtreme Gradient Boosting (XGBoost) model. Model performance was systematically evaluated via three approaches: (1) discrimination metrics, including AUC with 95% confidence intervals, accuracy, sensitivity, and specificity; (2) Kaplan-Meier survival analysis complemented by log-rank testing; and (3) SHapley Additive exPlanations (SHAP) for interpreting feature importance. Results The model exhibited robust predictive performance, with AUCs of 0.82 (training cohort), 0.79 (validation cohort), and 0.77 (test cohort). Key genomic predictors included TP53 mutations, which were associated with shorter PFS, and BRCA2 mutations, which correlated with longer PFS. SHAP analysis identified NOTCH1 as a novel predictive biomarker, whose feature contribution profile suggests a role in immune modulation in lung squamous cell carcinoma. Risk stratification significantly distinguished PFS outcomes (log-rank P < 0.05). Decision curve analysis confirmed the model’s clinical utility, as it outperformed “treat-all” strategies. Conclusion This study establishes a robust, interpretable ctDNA-derived machine learning algorithm for predicting PFS in NSCLC patients receiving immune checkpoint inhibitors. The identification of TP53, BRCA2, and NOTCH1 as biologically plausible predictive biomarkers advances understanding of immunotherapy response mechanisms and enables clinically actionable risk stratification to guide therapeutic decision-making. These findings underscore the need for prospective multicenter validation to facilitate translation into precision oncology practice.