Artificial intelligence techniques play a pivotal role in the accurate identification of drug-drug interaction (DDI) events, thereby informing clinical decisions and treatment regimens. While existing DDI prediction models have made significant progress by leveraging sequence features such as chemical substructures, targets, and enzymes, they often face limitations in integrating and effectively utilizing multi-modal drug representations. To address these limitations, this study proposes a novel multi-modal feature fusion model for DDI event prediction: MMDDI-SSE. Our approach integrates drug sequence modality with DDI graph representations through a novel architecture that employs static subgraph generation to capture structural properties. The model utilizes a graph autoencoder architecture to learn both local and global topological features from these subgraphs, while simultaneously processing diverse sequence-based characteristics including semantically enhanced pharmacodynamic features, chemical substructures, target proteins, and enzyme information. Through comprehensive evaluation on two distinct datasets, MMDDI-SSE demonstrates superior predictive performance compared to state-of-the-art baselines. Ablation studies further validate the effectiveness of each architectural component in enhancing DDI prediction accuracy. The implementation code and datasets are available at https://github.com/Tomchen1231/MMDDI-SSE.