Predicting Lymph Node Metastasis in Rectal Cancer: Development and Validation of a Machine Learning Model Using Clinical Data

接收机工作特性随机森林医学列线图人工智能逻辑回归机器学习单变量布里氏评分多层感知器计算机科学人工神经网络多元统计肿瘤科内科学

作者

Wei Hou,Chuangwei Li,Zhen Wang,Wanqin Wang,Shouhong Wan,Bingbing Zou

出处

期刊：JMIR medical informatics [JMIR Publications]
日期：2025-09-23 卷期号：13: e73765-e73765

链接

jmir.org jmir.org nih.govdoi.org

标识

DOI：10.2196/73765

摘要

Abstract Background Rectal cancer (RC) is a common malignant tumor, with lymph node metastasis (LNM) being a critical determinant of patient prognosis. Traditional diagnostic methods have limitations, necessitating the development of predictive models using clinical data. Objective This study aimed to construct and validate machine learning (ML) models to predict LNM risk in patients with RC based on clinical data. Methods Retrospective data from 2454 patients with RC (SEER [Surveillance, Epidemiology, and End Results] database) were split into training (n=1954) and internal validation (n=500) sets. An external cohort (n=500) was obtained from the First Affiliated Hospital of Anhui Medical University. Lymph node features identified via computed tomographic scans were integrated with clinicopathological data. Variables were selected using LASSO (Least Absolute Shrinkage and Selection Operator), followed by univariate and multivariate logistic regression. Eleven ML models (Logistic Regression, K-Nearest Neighbors, Extremely Randomized Trees, Naive Bayes, XGBoost [XBG], Light Gradient Boosting Machine, Multilayer Perceptron, Gradient Boosting, Support Vector Machine, Random Forest, and Ada-Boost) were evaluated via area under the receiver operating characteristic curve (AUC), calibration curves, and decision curve analysis. Results LNM prevalence was 26.9% (training), 27% (internal validation), and 81% (external validation). Independent LNM predictors included tumor grade, clinical T stage, N stage, tumor length, neural invasion, and total lymph nodes. Internal validation AUC ranged from 0.859 to 0.964; external validation AUC was 0.735‐0.838. In the internal validation set, Random Forest and Extremely Randomized Trees achieved the highest AUC (0.964, 95% CI 0.950‐0.978), while XGB demonstrated superior cross-cohort stability (AUC 0.942, 95% CI 0.925‐0.959). For external validation, Gradient Boosting had the highest AUC (0.838, 95% CI 0.801‐0.875), followed by XGB (0.832, 95%CI 0.794‐0.869). XGB showed minimal calibration error with curves closest to the ideal diagonal and yielded the highest net benefit in decision curve analysis across critical thresholds. Conclusions This study successfully developed and validated 11 ML models to predict LNM risk in RC. The XGB model was optimal, achieving an AUC >0.9 in 10 internal models and an AUC >0.8 in 7 external models. The identified predictors of LNM can facilitate early diagnosis and personalized treatment, highlighting the potential of integrating computed tomographic scan data with clinicopathological findings to build effective predictive models.

求助该文献

Predicting Lymph Node Metastasis in Rectal Cancer: Development and Validation of a Machine Learning Model Using Clinical Data

今日热心研友