Abstract Accurate estimation of the state of health of lithium-ion batteries is essential for reliable measurement and monitoring in energy storage systems. However, the battery capacity degradation process exhibits strong nonlinearity and stochastic fluctuations, particularly due to the phenomenon of capacity regeneration, which limits the modeling accuracy and generalization capability of traditional convolutional networks or single-feature extraction methods. To address this issue, this paper proposes a hybrid prediction framework based on a hierarchical feature dynamic fusion (HFDF) module, aiming to achieve multi-scale modeling and efficient feature fusion for complex degradation processes. First, the original capacity sequence is adaptively decomposed using time-varying filter empirical mode decomposition, effectively alleviating the mode mixing problem in traditional methods. The HFDF module then applies an enhanced attention mechanism and multi-level interactions to extract hierarchical features and perform adaptive fusion across high-, medium-, and low-frequency components, maintaining sensitivity to both global degradation trends and local variations while improving multi-scale modeling flexibility. Finally, an improved transformer model with a Pre-LN structure is integrated to further strengthen the modeling of long-term dependencies. Experiments conducted on NASA, CALCE, and noise-affected Xi’an Jiaotong University datasets achieved minimum root mean square error values of 0.0035, 0.0046, and 0.0042, respectively. These experiments demonstrate its superior performance in prediction accuracy, stability, and noise robustness, confirming the effectiveness of the HFDF module and the practical value of the framework.