作者
Yongxin Chen,Si–Yi Chen,Wenjie Tang,Qingcong Kong,Zhidan Zhong,Xiaomeng Yu,Yi Sui,Wenke Hu,Xinqing Jiang,Yuan Guo
摘要
BACKGROUND. MRI radiomics has been explored for three-tiered classification of HER2 expression levels (i.e., HER2-zero, HER2-low, or HER2-positive) in patients with breast cancer, although an understanding of how such models reach their predictions is lacking. OBJECTIVE. The purpose of this study was to develop and test multiparametric MRI radiomics machine learning models for differentiating three-tiered HER2 expression levels in patients with breast cancer, as well as to explain the contributions of model features through local and global interpretations with the use of Shapley additive explanation (SHAP) analysis. METHODS. This retrospective study included 737 patients (mean age, 54.1 ± 10.6 [SD] years) with breast cancer from two centers (center 1 [n = 578] and center 2 [n = 159]), all of whom underwent multiparametric breast MRI and had HER2 expression determined after excisional biopsy. Analysis entailed two tasks: differentiating HER2-negative (i.e., HER2-zero or HER2-low) tumors from HER2-positive tumors (task 1) and differentiating HER2-zero tumors from HER2-low tumors (task 2). For each task, patients from center 1 were randomly assigned in a 7:3 ratio to a training set (task 1: n = 405; task 2: n = 284) or an internal test set (task 1: n = 173; task 2: n = 122); patients from center 2 formed an external test set (task 1: n = 159; task 2: n = 105). Radiomic features were extracted from early phase dynamic contrast-enhanced (DCE) imaging, T2-weighted imaging, and DWI. For each task, a support vector machine (SVM) was used for feature selection, a multiparametric radiomics score (radscore) was computed using feature weights from SVM correlation coefficients, conventional MRI and combined models were constructed, and model performances were evaluated. SHAP analysis was used to provide local and global interpretations of the model outputs. RESULTS. In the external test set, for task 1, AUCs for the conventional MRI model, radscore, and the combined model were 0.624, 0.757, and 0.762, respectively; for task 2, the AUC for radscore was 0.754, and no conventional MRI model or combined model could be constructed. SHAP analysis identified early phase DCE imaging features as having the strongest influence for both tasks; T2-weighted imaging features also had a prominent role for task 2. CONCLUSION. The findings indicate suboptimal performance of MRI radiomics models for noninvasive characterization of HER2 expression. CLINICAL IMPACT. The study provides an example of the use of SHAP interpretation analysis to better understand predictions of imaging-based machine learning models.