作者
Gangfeng Zhu,Yipeng Song,Zenghong Lu,Qiang Yi,Rui Xu,Yi Xie,Shi Geng,Na Yang,Liangjian Zheng,Xiaofei Feng,Rui Zhu,Xiangcai Wang,Li‐Min Huang,Yi Xiang
摘要
Metabolic dysfunction-associated steatotic liver disease (MASLD) is a global health concern that necessitates early screening and timely intervention to improve prognosis. The current diagnostic protocols for MASLD involve complex procedures in specialised medical centres. This study aimed to explore the feasibility of utilising machine learning models to accurately screen for MASLD in large populations based on a combination of essential demographic and clinical characteristics. A total of 10,007 outpatients who underwent transient elastography at the First Affiliated Hospital of Gannan Medical University were enrolled to form a derivation cohort. Using eight demographic and clinical characteristics (age, educational level, height, weight, waist and hip circumference, and history of hypertension and diabetes), we built predictive models for MASLD (classified as none or mild: controlled attenuation parameter (CAP) ≤ 269 dB/m; moderate: 269-296 dB/m; severe: CAP > 296 dB/m) employing 10 machine learning algorithms: logistic regression (LR), multilayer perceptron (MLP), extreme gradient boosting (XGBoost), bootstrap aggregating, decision tree, K-nearest neighbours, light gradient boosting machine, naive Bayes, random forest, and support vector machine. These models were externally validated using the National Health and Nutrition Examination Survey (NHANES) 2017-2023 datasets. In the hospital outpatient cohort, machine learning algorithms demonstrated robust predictive capabilities. Notably, LR achieved the highest accuracy (ACC) of 0.711 in the test cohort and 0.728 in the validation cohort, coupled with robust areas under the receiver operating characteristic curve (AUC) values of 0.798 and 0.806, respectively. Similarly, MLP and XGBoost showed promising results, with MLP achieving an ACC of 0.735 in the test cohort, and XGBoost registering an AUC of 0.798. External validation using the NHANES datasets yielded consistent AUC results, with LR (0.831), MLP (0.823), and XGBoost (0.784) performing robustly. This study demonstrated that machine learning models constructed using a combination of essential demographic and clinical characteristics can accurately screen for MASLD in the general population. This approach significantly enhances the feasibility, accessibility, and compliance of MASLD screening and provides an effective tool for large-scale health assessments and early intervention strategies.