Can Separation Enhance Fusion? An Efficient Framework for Target Detection in Multimodal Remote Sensing Imagery

计算机科学遥感分离（统计）图像融合人工智能地质学机器学习图像（数学）

作者

Yong Wang,Jing Jia,Rui Liu,Qianqian Cao,Jie Feng,Danping Li,Lei Wang

出处

期刊：Remote Sensing [Multidisciplinary Digital Publishing Institute]
日期：2025-04-10 卷期号：17 (8): 1350-1350

链接

doi.orgdoi.org

标识

DOI：10.3390/rs17081350

摘要

Target detection in remote sensing images has garnered significant attention due to its wide range of applications. Many traditional methods primarily rely on unimodal data, which often struggle to address the complexities of remote sensing environments. Furthermore, small-target detection remains a critical challenge in remote sensing image analysis, as small targets occupy only a few pixels, making feature extraction difficult and prone to errors. To address these challenges, this paper revisits the existing multimodal fusion methodologies and proposes a novel framework of separation before fusion (SBF). Leveraging this framework, we present Sep-Fusion—an efficient target detection approach tailored for multimodal remote sensing aerial imagery. Within the modality separation module (MSM), the method separates the three RGB channels of visible light images into independent modalities aligned with infrared image channels. Each channel undergoes independent feature extraction through the unimodal block (UB) to effectively capture modality-specific features. The extracted features are then fused using the feature attention fusion (FAF) module, which integrates channel attention and spatial attention mechanisms to enhance multimodal feature interaction. To improve the detection of small targets, an image regeneration module is exploited during the training stage. It incorporates the super-resolution strategy with attention mechanisms to further optimize high-resolution feature representations for subsequent positioning and detection. Sep-Fusion is currently developed on the YOLO series to make itself a potential real-time detector. Its lightweight architecture enables the model to achieve high computational efficiency while maintaining the desired detection accuracy. Experimental results on the multimodal VEDAI dataset show that Sep-Fusion achieves 77.9% mAP50, surpassing many state-of-the-art models. Ablation experiments further illustrate the respective contribution of modality separation and attention fusion. The adaptation of our multimodal method to unimodal target detection is also verified on NWPU VHR-10 and DIOR datasets, which proves Sep-Fusion to be a suitable alternative to current detectors in various remote sensing scenarios.

求助该文献

最长约 10秒，即可获得该文献文件

Can Separation Enhance Fusion? An Efficient Framework for Target Detection in Multimodal Remote Sensing Imagery

今日热心研友