计算机科学
自编码
融合
人工智能
推论
判别式
融合机制
图像融合
分割
先验概率
质量(理念)
编码(内存)
深度学习
图像(数学)
机器学习
模式识别(心理学)
计算机视觉
贝叶斯概率
哲学
语言学
认识论
脂质双层融合
作者
Wuqiang Qi,Zhuoqun Zhang,Zhishe Wang
标识
DOI:10.62762/cjif.2024.655617
摘要
Image fusion aims to integrate complementary information from different sensors into a single fused output for superior visual description and scene understanding. The existing GAN-based fusion methods generally suffer from multiple challenges, such as unexplainable mechanism, unstable training, and mode collapse, which may affect the fusion quality. To overcome these limitations, this paper introduces a diffusion model guided cross-attention learning network, termed as DMFuse, for infrared and visible image fusion. Firstly, to improve the diffusion inference efficiency, we compress the quadruple channels of the denoising UNet network to achieve more efficient and robust model for fusion tasks. After that, we employ the pre-trained diffusion model as an autoencoder and incorporate its strong generative priors to further train the following fusion network. This design allows the generated diffusion features to effectively showcase high-quality distribution mapping ability. In addition, we devise a cross-attention interactive fusion module to establish the long-range dependencies from local diffusion features. This module integrates the global interactions to improve the complementary characteristics of different modalities. Finally, we propose a multi-level decoder network to reconstruct the fused output. Extensive experiments on fusion tasks and downstream applications, including object detection and semantic segmentation, indicate that the proposed model yields promising performance while maintaining competitive computational efficiency. The codes will be released at https://github.com/Zhishe-Wang/DMFuse.
科研通智能强力驱动
Strongly Powered by AbleSci AI