对偶(语法数字)
变压器
红外线的
融合
计算机科学
图像融合
人工智能
图像(数学)
计算机视觉
物理
电气工程
光学
工程类
电压
艺术
语言学
哲学
文学类
作者
Jinshi Guo,Yang Li,Yutong Chen,Ling Yu
摘要
Infrared and visible image fusion aims to integrate complementary thermal radiation and detailed information to enhance scene understanding. Transformer architectures have shown promising performance in this ffeld, but their feed-forward networks struggle to model multi-scale features, and self-attention often aggregates features using the similarities of all tokens in the queries and keys, which leads to irrelevant tokens introducing noise. To address these issues, this paper proposes a Sparse Dual Aggregation Transformer-based network for Infrared and Visible Image Fusion (SDATFuse). First, a hybrid multi-scale feed-forward network is introduced to effectively model multi-scale information and extract cross-modal features. Next, a sparse spatial self-attention mechanism is developed, using dynamic top-k selection operator to fflter key self-attention values. By applying sparse spatial self-attention and channel self-attention in consecutive Transformer blocks, SDATFuse constructs a dual aggregation structure that efffciently integrates inter-block features. Additionally, a Dynamic Interaction Module (DIM) aggregates intra-block features across different self-attention dimensions. Finally, in the fusion stage, a Dual Selective Attention Module (DSAM) dynamically selects weights for global and local features from both modalities, utilizing spatial and channel self-attention maps. The proposed SDATFuse demonstrates superior performance on multiple infrared and visible image datasets. Experiments show that SDATFuse's fused results outperform state-of-the-art models in both qualitative and quantitative evaluations, effectively reducing noise and preserving detailed information.
科研通智能强力驱动
Strongly Powered by AbleSci AI