Background Accurate preoperative prediction of pathological complete response (pCR) following neoadjuvant chemoimmunotherapy (nCIT) could help individualize treatment for patients with esophageal squamous cell carcinoma (ESCC). This study aimed to develop and externally validate an interpretable multimodal machine learning framework that integrates CT radiomics and H&E-stained whole-slide images pathomics to predict pCR. Methods In this multicenter, retrospective study, 335 patients with ESCC who received nCIT followed by esophagectomy were enrolled from three institutions. Patients from one center were divided into a training set (181 patients) and an internal test set (115 patients), while data from the other two centers comprised an external test set (39 patients). We developed unimodal radiomics and pathomics models, and two multimodal fusion models—an intermediate fusion model (MIFM) and a late fusion model (MLFM). Model performance was evaluated using the area under the curve (AUC), accuracy, sensitivity, specificity, and F1 score, with exploratory survival stratification by observed and model-predicted pCR status. Interpretability was treated as a design constraint and operationalized at both the feature and model levels. Results The MIFM outperformed unimodal models and the MLFM across all cohorts, achieving AUC/accuracy/sensitivity/specificity/F1 score of 0.97/0.93/0.84/0.96/0.86 (training set), 0.78/0.87/0.62/0.93/0.63 (internal test set), and 0.76/0.77/0.54/0.88/0.61 (external test set). Both observed and predicted pCR status showed exploratory prognostic stratification for overall survival. Feature definitions were mathematically or morphologically explicit, and case-level/cohort-level explanations together with decision-pathway views provided insights into model reasoning. We additionally provide a user-friendly Graphical User Interface to facilitate clinical practice. Conclusions We developed and externally validated an interpretable radiopathomics fusion framework that predicts pCR after nCIT in ESCC using standard-of-care data. This model holds promise as an effective tool for guiding individualized decisions between surveillance and timely surgery.