计算机科学
遥感
分离(统计)
图像融合
人工智能
地质学
机器学习
图像(数学)
作者
Yong Wang,Jing Jia,Rui Liu,Qianqian Cao,Jie Feng,Danping Li,Lei Wang
出处
期刊:Remote Sensing
[Multidisciplinary Digital Publishing Institute]
日期:2025-04-10
卷期号:17 (8): 1350-1350
摘要
Target detection in remote sensing images has garnered significant attention due to its wide range of applications. Many traditional methods primarily rely on unimodal data, which often struggle to address the complexities of remote sensing environments. Furthermore, small-target detection remains a critical challenge in remote sensing image analysis, as small targets occupy only a few pixels, making feature extraction difficult and prone to errors. To address these challenges, this paper revisits the existing multimodal fusion methodologies and proposes a novel framework of separation before fusion (SBF). Leveraging this framework, we present Sep-Fusion—an efficient target detection approach tailored for multimodal remote sensing aerial imagery. Within the modality separation module (MSM), the method separates the three RGB channels of visible light images into independent modalities aligned with infrared image channels. Each channel undergoes independent feature extraction through the unimodal block (UB) to effectively capture modality-specific features. The extracted features are then fused using the feature attention fusion (FAF) module, which integrates channel attention and spatial attention mechanisms to enhance multimodal feature interaction. To improve the detection of small targets, an image regeneration module is exploited during the training stage. It incorporates the super-resolution strategy with attention mechanisms to further optimize high-resolution feature representations for subsequent positioning and detection. Sep-Fusion is currently developed on the YOLO series to make itself a potential real-time detector. Its lightweight architecture enables the model to achieve high computational efficiency while maintaining the desired detection accuracy. Experimental results on the multimodal VEDAI dataset show that Sep-Fusion achieves 77.9% mAP50, surpassing many state-of-the-art models. Ablation experiments further illustrate the respective contribution of modality separation and attention fusion. The adaptation of our multimodal method to unimodal target detection is also verified on NWPU VHR-10 and DIOR datasets, which proves Sep-Fusion to be a suitable alternative to current detectors in various remote sensing scenarios.
科研通智能强力驱动
Strongly Powered by AbleSci AI