摘要
Given the challenges in capturing temporal dependencies within sports event data and the imbalance between global and local feature representations, this study introduces a Transformer-based model designed to address these issues. By leveraging a multi-head self-attention mechanism, the model effectively captures dynamic features across different time granularities, thereby enhancing the analysis of temporal event data and improving the accuracy of win rate prediction. Specifically, a time-segment encoding strategy is first employed to partition the event sequence data, enabling independent processing of features within each temporal segment. Subsequently, a multi-level Transformer architecture is constructed to extract both short-term and long-term dependencies at different hierarchical levels, facilitating a more comprehensive understanding of game dynamics. To further refine feature representation, a dynamic self-attention adjustment mechanism is incorporated, allowing the model to adaptively focus on salient features based on the characteristics of the input data. Experimental results demonstrate that, in comparison with baseline models—including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and Extreme Gradient Boosting (XGBoost)—the proposed model achieves superior performance. Specifically, it improves prediction accuracy by 10.7%, 8.3%, 3.9%, 6.0%, 4.3%, and 2.4%, respectively, and enhances precision by 10.6%, 9.4%, 5.0%, 6.5%, 4.5%, and 3.6%, respectively. These findings underscore the model’s effectiveness in handling complex temporal sequences and multi-layered feature structures, thereby significantly improving the accuracy and robustness of win rate predictions in sports events.