作者
Tianrui Kuang,Rui Wang,Yunwei Sun,Pan Gao,He Cai,Yongbing Li,Xin Wang,Yunqiang Cai,Jin Zhou,Bing Peng,Zhong Wu
摘要
Background: Pancreatic cancer (PC) is a highly aggressive solid tumor, with a 5-year survival rate of under 10% post-surgery. Surgical resection remains the only potentially curative treatment, but the outcome is heavily influenced by the tumor’s biological heterogeneity. This results in considerable variations in patient prognosis. A particular challenge in PC is its diverse pathological subtypes, including pancreatic ductal adenocarcinoma (PDAC) and pancreatic adenosquamous carcinoma (PASC). While PDAC is the most common form, PASC is a rare but more aggressive variant with a worse prognosis. Despite similarities in treatment strategies, these subtypes exhibit distinct molecular profiles, and their response to therapy can vary significantly.This study aims to develop a machine learning-based prognostic tool that integrates multidimensional clinical data to overcome the limitations of single biomarkers, providing a more personalized and dynamic approach to monitoring postoperative survival outcomes. Methods: In this study, two feature selection techniques (Boruta and LASSO) were used to identify key survival-related variables using clinical and laboratory data from postoperative pancreatic cancer patients. Nine commonly used machine learning models, including XGBoost, LightGBM, logistic regression, and random forest, were developed based on the most important 7 features and compared for predicting postoperative overall survival. We comprehensively evaluated the stability of the XGBoost sub-models, feature interpretability (via SHAP analysis), and generalizability across different pathologic subtypes. Results: Among all models, XGBoost performed optimally on the validation set (AUROC = 0.796) and demonstrated consistent and stable performance in 1-, 3-, and 5-year survival prediction (AUROC = 0.593, 0.699, and 0.774) SHAP analysis showed that the variables of basophils, pathology type, DB, CA125, and HDL contributed the most to the prediction PDAC and PASC subtype analysis further confirmed the robustness and broad applicability of the model (AUC 0.730 and 0.822, respectively). The core features were related to clinical pathways, tumor microenvironment and patient system status, which enhanced the explanatory power and clinical translation potential of the model. Conclusion: The XGBoost-based machine learning model developed in this study integrates clinical, laboratory, and pathological data, providing superior predictive performance and good interpretability for assessing postoperative survival in pancreatic cancer. This model not only offers robust predictions for the general population but also effectively identifies risk variations across pathological subtypes. It holds substantial clinical value for postoperative risk stratification and decision-making in treatment planning.