Rolling bearings are critical components in rotating machinery, and their performance degrades over time due to operational wear, which may compromise the safety and efficiency of mechanical systems. Therefore, accurate and timely fault diagnosis of rolling bearings is crucial. In real-world industrial environments, such diagnosis remains challenging owing to complex and varying operating conditions. Conventional single-modality deep learning methods often face limitations and fail to satisfy practical demands. To overcome these challenges, this paper proposes a novel fault diagnosis approach based on a Parallel Heterogeneous Deep Network (PHDN-FD). First, the original vibration signals are segmented according to signal pattern similarity. The continuous wavelet transform (CWT) using the Morse wavelet is applied to convert one-dimensional signal segments into two-dimensional time–frequency representations. Subsequently, each signal segment and its corresponding time–frequency representation are paired to form input data for a dual-branch parallel network. One branch, based on the ConvNeXt architecture, extracts spatial features from the time–frequency images, while the other branch employs a 1D-ResNet to capture temporal features from the raw signal segments. The features from both branches are then fused and fed into a three-layer feedforward neural network for final fault classification. Experimental results on the Case Western Reserve University (CWRU) bearing dataset and Korean Academy of Science and Technology (KAIST) bearing datasets show that the proposed method achieves high diagnostic accuracy even under adverse conditions, such as noise interference, limited training samples, and variable load levels. Moreover, the model exhibits strong cross-load transferability. By effectively integrating multimodal feature representations, the PHDN-FD framework improves both diagnostic accuracy and model robustness in complex operational scenarios, establishing a solid foundation for industrial deployment and demonstrating significant potential for practical applications.