A multi-scale adaptive transformer with feature enhancement for fault diagnosis of rolling bearings under imbalanced small-sample and cross-condition scenarios
With the increasing complexity of industrial machinery and the growing demand for intelligent fault diagnosis, accurately identifying rolling bearing faults under conditions of data scarcity and varying operating conditions has become a major challenge. In real-world applications, the scarcity of fault samples and the diversity of working environments often lead to sample imbalance and distribution shifts, which severely compromise the performance of conventional diagnostic models. To address these challenges, a novel multi-scale adaptive transformer (MAT) model is proposed for fault diagnosis of rolling bearings under imbalanced small-sample and cross-condition scenarios. The model integrates a multi-scale feature enhancement backbone with the global modeling capability of a hierarchical encoder, enabling the simultaneous extraction of fine-grained local features and long-range contextual fault representations. Specifically, the feature enhancement backbone incorporates dilated convolutions, spatial pyramid pooling, and a spatial attention mechanism to extract rich contextual information through expanded receptive fields, fuse multi-scale spatial features, and adaptively focus on fault-relevant regions to suppress noise. This design effectively improves the representational capacity and robustness of the model under data-limited conditions. In the hierarchical encoder, a channel attention residual sublayer is introduced to adaptively reweight feature dimensions, thereby increasing the sensitivity of the model to critical local features and improving the resistance of the model to overfitting. Extensive experiments conducted on the Western Reserve University and Paderborn bearing datasets demonstrate that the proposed MAT model significantly outperforms existing mainstream methods in both cross-condition and imbalanced small-sample fault diagnosis tasks. These results fully validate the effectiveness and generalization capability of the proposed approach in practical intelligent manufacturing applications.