Predictive Modeling for Survival Outcomes in Surgically Resected Pancreatic Ductal Adenocarcinoma: A Comprehensive Machine Learning Approach Using Real-World Data
作者
Kaleem S. Ahmed,Sheriff M Issaka,Benjamin A. Y. Cher,Clayton T. Marcinak,Syed Nabeel Zafar
标识
DOI:10.1177/2993091x251386458
摘要
Introduction: Surgery for pancreatic ductal adenocarcinoma (PDAC) is highly morbid, so appropriate patient selection is crucial. Machine learning models have demonstrated potential for predicting outcomes and facilitating decision-making. We sought to develop and validate machine learning models for 1-year survival in patients with surgically resected PDAC using data from a large, multicenter, real-world electronic health record (EHR) database. Methods: Retrospective cohort study using the American Society of Clinical Oncology CancerLinQ Discovery® Pancreatic Cancer Dataset. Study population included patients with PDAC undergoing surgical resection from 1998 to 2021. Data were abstracted from the EHR, considering only information available prior to surgery. The primary outcome was survival at 1 year post-resection. Five machine learning models were developed using a robust feature selection process. Predictive accuracy was assessed using the area under the curve (AUC) in a hold-out dataset. Results: The study included 1,567 patients who underwent curative-intent pancreatectomy, and 870 (55.5%) survived at least 1 year. The gradient boosting (GB) model performed best and achieved an AUC of 0.78, sensitivity of 90%, specificity of 44%, and positive/negative predictive values of 0.65/0.80. Feature importance analysis revealed type of operation, receipt of chemotherapy, tumor size, and ethnicity as the most important predictors. Conclusions: The most accurate model predicted 1-year survival with higher accuracy than prior published models. This study adds to ongoing efforts to predict post-resection outcomes and generate useful data to facilitate patient selection for resection. The study also demonstrates the opportunities and challenges of applying machine learning techniques to EHR data.