ABSTRACT Recent advances in deep learning (DL) have significantly improved the performance of time‐series forecasting tasks. While recurrent neural networks (RNNs) have traditionally served as the foundation for such models, DL/RNN‐based forecasting models have frequently struggled with capturing long‐range temporal dependencies as well as handling noise, particularly in multivariate and high‐dimensional settings. Transformer‐based architectures have emerged as promising alternatives due to their ability to model complex temporal patterns across extended time‐dependent sequences. However, most existing transformer‐based forecasting techniques often face limitations in mitigating feature noise as well as ambiguity during the long‐sequence representation learning process. To overcome these challenges, in this paper, we propose a novel DT4TS model—which is a denoising transformer‐based architecture that integrates a cross‐time/dimension embedding mechanism with a radial basis function neural network (RBFNN) layer to effectively enhance noise suppression during the temporal feature extraction process. To evaluate the effectiveness of our proposed DT4TS model, we evaluate DT4TS on a real‐world air quality dataset, focusing on PM2.5 prediction across two monitoring stations in Ho Chi Minh City, Vietnam. On average across datasets, DT4TS reduces RMSE by 25.45% and MAE by 16.52% compared with Crossformer, with even larger gains over Autoformer and FEDformer which are known as state‐of‐the‐art transformer‐based architectures for time‐series learning. These results demonstrate the superior accuracy and robustness of our proposed DT4TS model within the long‐range air quality forecasting problem; as a result, confirming the effectiveness of its noise‐resilient embedding design in capturing fine‐grained temporal dependencies over extended horizons.