透视图(图形)
计算机科学
遥感
目标检测
对偶(语法数字)
人工智能
计算机视觉
对象(语法)
模式识别(心理学)
地质学
文学类
艺术
作者
Yanfeng Liu,Wei Guo,Chaojun Yao,Lefei Zhang
标识
DOI:10.1109/tgrs.2025.3577046
摘要
Recently, anchor-based detectors can achieve decent performance in multimodal remote sensing scenarios, whereas their anchor-free counterparts fail to reach comparable results. To remedy this problem, we first comprehensively investigate the misalignment issues in multimodal features and detection heads, and present a dual-perspective alignment learning (DPAL) framework for multimodal remote sensing object detection. Particularly, we design a cross-modal alignment module (CMAM), which utilizes the multiscale dilation strategy and differentiable alignment function with channel-wise modulation for cross-modal feature integration. Additionally, to cope with the misalignment problem in regression and classification heads, we propose a task-head alignment module (THAM). It presents a novel pseudo-anchor mechanism, introduces a semi-fixed offset generation strategy to capture task-variant sampling coordinates, and ultimately deploys an offset knowledge transfer mechanism with deformable alignment for anchor-free detection heads. Extensive experiments on four multimodal object detection datasets show impressive results of the proposed DPAL framework. The project code is released at https://github.com/lyf0801/DPAL.
科研通智能强力驱动
Strongly Powered by AbleSci AI