WaveFusion: A Novel Wavelet Vision Transformer With Saliency-Guided Enhancement for Multimodal Image Fusion

人工智能计算机视觉计算机科学图像融合小波融合图像增强模式识别（心理学）图像（数学）语言学哲学

作者

Qinghua Wang,Ziwei Li,Shuqi Zhang,Nan Chi,Qionghai Dai

出处

期刊：IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers]
日期：2025-03-11 卷期号：35 (8): 7526-7542 被引量：10

标识

DOI：10.1109/tcsvt.2025.3549459

摘要

Multi-modal image fusion aims to amalgamate pivotal information from various sensor sources to provide informative visual representation in imaging scenes. Rapid and precise fusion of images is crucial for practical applications in fields such as autonomous driving and medical diagnostics. However, the primary challenge lies in balancing computational costs with the effectiveness of feature extraction, while ensuring the robust integration of salient features across modalities. Here, this paper introduces WaveFusion, a wavelet vision transformer equipped with an advanced saliency-guided loss strategy to optimize multi-modal image fusion. Initially, to provide a comprehensive and efficient representation of multi-modal data, we introduce an adaptive wavelet transform module for feature decomposition and reconstruction. Following this, self-attention mechanisms and convolutional networks are naturally applied in parallel to process low-frequency and high-frequency components, resulting in the development of a wavelet-enhanced vision transformer. Secondly, WaveFusion utilizes a dual-aggregation attention approach that improves cross-modal feature complementarity and intra-modal feature coherence within a single fusion module. Furthermore, we propose a dynamic saliency-informed selective loss function to refine the optimization process, with the objective of enhancing critical feature retention and maintaining overall image consistency across fusion scenarios. The efficacy and versatility of our method are validated in both infrared-visible fusion and medical image fusion tasks. Experiment results demonstrate that WaveFusion provides a superior balanced approach that optimizes both fusion performance and cost-efficiency, and additionally improves performance in downstream tasks such as multi-modal semantic segmentation and object detection.

求助该文献

最长约 10秒，即可获得该文献文件

WaveFusion: A Novel Wavelet Vision Transformer With Saliency-Guided Enhancement for Multimodal Image Fusion

今日热心研友