计算机科学
人工智能
图像融合
图像(数学)
扩散
模式识别(心理学)
程序设计语言
热力学
物理
作者
Linfeng Tang,Chunyu Li,Jiayi Ma
标识
DOI:10.1109/tpami.2025.3609323
摘要
The absence of ground truth (GT) in most fusion tasks poses significant challenges for model optimization, evaluation, and generalization. Existing fusion methods achieving complementary context aggregation predominantly rely on hand-crafted fusion rules and sophisticated loss functions, which introduce subjectivity and often fail to adapt to complex real-world scenarios. To address this challenge, we propose Mask-DiFuser, a novel fusion paradigm that ingeniously transforms the unsupervised image fusion task into a dual masked image reconstruction task by incorporating masked image modeling with a diffusion model, overcoming various issues arising from the absence of GT. In particular, we devise a dual masking scheme to simulate complementary information and employ a diffusion model to restore source images from two masked inputs, thereby aggregating complementary contexts. A content encoder with an attention parallel feature mixer is deployed to extract and integrate complementary features, offering local content guidance. Moreover, a semantic encoder is developed to supply global context which is integrated into the diffusion model via a cross-attention mechanism. During inference, Mask-DiFuser begins with a Gaussian distribution and iteratively denoises it conditioned on multi-source images to directly generate fused images. The masked diffusion model, learning priors from high-quality natural images, ensures that fusion results align more closely with human visual perception. Extensive experiments on several fusion tasks, including infrared-visible, medical, multi-exposure, and multi-focus image fusion, demonstrate that Mask-DiFuser significantly outshines SOTA fusion alternatives.
科研通智能强力驱动
Strongly Powered by AbleSci AI