Differentiating whether hepatic cystic echinococcosis (HCE) lesions exhibit biological activity is essential for developing effective treatment plans. This study evaluates the performance of a Transformer-based fusion model in predicting HCE lesion activity. This study analyzed CT images and clinical variables from 700 HCE patients across three hospitals from 2018 to 2023. Univariate and multivariate logistic regression analyses were conducted for the selection of clinical variables to construct a clinical model. Radiomic features were extracted from CT images using Pyradiomics to develop a radiomics model. Additionally, a 2D deep learning model and a 3D deep learning model were trained using the CT images. The fusion model was constructed using feature-level fusion, decision-level fusion, and a Transformer network architecture, allowing for the analysis of the discriminative ability and correlation among radiomic features, 2D deep learning features, and 3D deep learning features, while comparing the classification performance of the three multimodal fusion models. In comparison to radiomic and 2D deep learning features, the 3D deep learning features exhibited superior discriminative ability in identifying the biological activity of HCE lesions. The Transformer-based fusion model demonstrated the highest performance in both the test set and the external validation set, achieving AUC values of 0.997 (0.992-1.000) and 0.944 (0.911-0.977), respectively, thereby outperforming both the feature-level and decision-level fusion models, and enabling precise differentiation of HCE lesion biological activity. The Transformer multimodal fusion model integrates clinical features, radiomic features, and both 2D and 3D deep learning features, facilitating accurate differentiation of the biological activity of HCE lesions and exhibiting significant potential for clinical application.