红外线的
计算机科学
图像(数学)
融合
图像融合
人工智能
计算机视觉
物理
光学
语言学
哲学
作者
Xiaoli Zhang,Liying Wang,Libo Zhao,Xiongfei Li,Siwei Ma
出处
期刊:Cornell University - arXiv
日期:2024-06-11
被引量:1
标识
DOI:10.48550/arxiv.2407.06159
摘要
Multi-modality image fusion aims at fusing modality-specific (complementarity) and modality-shared (correlation) information from multiple source images. To tackle the problem of the neglect of inter-feature relationships, high-frequency information loss, and the limited attention to downstream tasks, this paper focuses on how to model correlation-driven decomposing features and reason high-level graph representation by efficiently extracting complementary information and aggregating multi-guided features. We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy. Firstly, shallow features from individual modalities are extracted by a depthwise convolution layer combined with the transformer block. In the three parallel branches of the encoder, Cross Attention and Invertible Block (CAI) extracts local features and preserves high-frequency texture details. Base Feature Extraction Module (BFE) captures long-range dependencies and enhances modality-shared information. Graph Reasoning Module (GR) is introduced to reason high-level cross-modality relations and simultaneously extract low-level detail features as CAI's modality-specific complementary information. Experiments demonstrate the competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks. Moreover, the proposed algorithm surpasses the state-of-the-art methods in terms of subsequent tasks, averagely scoring 8.27% mAP@0.5 higher in object detection and 5.85% mIoU higher in semantic segmentation. The code is avaliable at https://github.com/Abraham-Einstein/SMFNet/.
科研通智能强力驱动
Strongly Powered by AbleSci AI