Fusing LRHSI with HRMSI is a widely used strategy to generate HRHSI. Diffusion models, which progressively denoise input data, effectively capture both global structures and fine details, offering flexible modeling of complex spectral-spatial relationships. These models have shown strong generative capabilities for hyperspectral-multispectral image (HSI-MSI) fusion, with promising application potential. However, two main challenges persist: (1) insufficient guidance from physical priors during residual generation, leading to spectral and structural distortions; and (2) the simplistic injection of HRMSI as an auxiliary condition into the denoising network results in weak interaction between high- and low-frequency spatial features of HRMSI and LRHSI. In response to these challenges, our proposed Prior-Guided Fusion Diffusion Network (PG-FDN) enables HSI-MSI fusion. PG-FDN integrates a Prior-Guided Gradient Mechanism (PGGM) and a denoising model. PGGM embeds spectral-frequency priors into the gradient update process, guiding residual generation to reduce spectral distortion and preserve local textures. Additionally, the denoising model adopts a Bidirectional Progressive Decoder (BPD), which enables hierarchical integration of HRMSI spatial features via forward–backward feature interaction. Using two synthetic and three real-world datasets, experiments reveal that PG-FDN outperforms six representative methods. Component-wise ablation analyses validate the individual contribution of each module, and cross-domain evaluations further confirm its robustness and adaptability. Code and dataset link: https://github.com/xiaotaiyang-ops/fusion.