The Transformer architecture has demonstrated outstanding performance in image dehazing with its global modeling capability. However, it suffers from a quadratic computational complexity bottleneck when processing high-resolution images, leading to the restriction of attention mechanisms to local computations, which fails to effectively capture global dependencies between distant pixels in images. To address these challenges, this article proposes a novel diffusion attention mechanism that can perform efficient global feature modeling while reducing computational overhead. Specifically, this study adopts a window-based Transformer architecture. First, a subblock partitioning strategy is implemented within local windows, where homogeneous attention weights are applied to adjacent subblocks, enabling window size expansion while maintaining constant computational cost. Then, a clarity mixing module is proposed to alleviate the weakening of intrablock information interaction. Simultaneously, snake-scan block displacement is applied to move and reorganize the image in block units without disrupting image continuity, strengthening information connections between different windows. Additionally, we collected a large-scale nonuniform remote sensing dehazing dataset, RSHaze5K, for evaluating the dehazing capability of networks on heterogeneous images. Experimental results indicate that this method achieves over 30 dB PSNR on the RSHaze5K dataset, reaching SOTA. Compared to the standard Swin Transformer architecture, our model possesses a larger receptive field without increasing computational cost, offering greater advantages when dealing with high-resolution images.