Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study

机器学习人工智能随机森林决策树支持向量机医学败血症逻辑回归多层感知器梯度升压重症监护室计算机科学急诊医学重症监护医学人工神经网络内科学

作者

Jingchao Lei,Jia Zhai,Yao Zhang,Jing Qi,Chuanzheng Sun

出处

期刊：Journal of Medical Internet Research [JMIR Publications]
日期：2025-05-26 卷期号：27: e66733-e66733 被引量：6

链接

doi.org nih.gov nih.gov doaj.orgdoi.org

标识

DOI：10.2196/66733

摘要

Background Sepsis-associated liver injury (SALI) is a severe complication of sepsis that contributes to increased mortality and morbidity. Early identification of SALI can improve patient outcomes; however, sepsis heterogeneity makes timely diagnosis challenging. Traditional diagnostic tools are often limited, and machine learning techniques offer promising solutions for predicting adverse outcomes in patients with sepsis. Objective This study aims to develop an explainable machine learning model, incorporating stacking techniques, to predict the occurrence of liver injury in patients with sepsis and provide decision support for early intervention and personalized treatment strategies. Methods This retrospective multicenter cohort study adhered to the TRIPOD+AI (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis, Extended for Artificial Intelligence) guidelines. Data from 8834 patients with sepsis in the Medical Information Mart for Intensive Care IV (MIMIC-IV) database were used for training and internal validation, while data from 4236 patients in the eICU-Collaborative Research Database (eICU-CRD) database were used for external validation. SALI was defined as an international normalized ratio >1.5 and total bilirubin >2 mg/dL within 1 week of intensive care unit admission. Nine machine learning models—decision tree, random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), support vector machine, elastic net, logistic regression, multilayer perceptron, and k-nearest neighbors—were trained. A stacking ensemble model, using LightGBM, XGBoost, and RF as base learners and Lasso regression as the meta-model, was optimized via 10-fold cross-validation. Hyperparameters were tuned using grid search and Bayesian optimization. Model performance was evaluated using accuracy, balanced accuracy, Brier score, detection prevalence, F1-score, Jaccard index, κ coefficient, Matthews correlation coefficient, negative predictive value, positive predictive value, precision, recall, area under the receiver operating characteristic curve (ROC-AUC), precision-recall AUC, and decision curve analysis. Shapley additive explanations (SHAP) values were used to quantify feature importance. Results In the training set, LightGBM, XGBoost, and RF demonstrated the best performance among all models, with ROC-AUCs of 0.9977, 0.9311, and 0.9847, respectively. These models exhibited minimal variance in cross-validation, with tightly clustered ROC-AUC and precision-recall area under the curve distributions. In the internal validation set, LightGBM (ROC-AUC 0.8401) and XGBoost (ROC-AUC 0.8403) outperformed all other models, while RF achieved an ROC-AUC of 0.8193. In the external validation set, LightGBM (ROC-AUC 0.7077), XGBoost (ROC-AUC 0.7169), and RF (ROC-AUC 0.7081) maintained strong performance, although with slight decreases in ROC-AUC compared with the training set. The stacking model achieved ROC-AUCs of 0.995, 0.838, and 0.721 in the training, internal validation, and external validation sets, respectively. Key predictors—total bilirubin, lactate, prothrombin time, and mechanical ventilation status—were consistently identified across models, with SHAP analysis highlighting their significant contributions to the model’s predictions. Conclusions The stacking ensemble model developed in this study yields accurate and robust predictions of SALI in patients with sepsis, demonstrating potential clinical utility for early intervention and personalized treatment strategies.

求助该文献

最长约 10秒，即可获得该文献文件

Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study

今日热心研友