计算机科学
模态(人机交互)
对象(语法)
人工智能
作者
Wenhao Dong,Haodong Zhu,Shaohui Lin,Xiaoyan Luo,Yunhang Shen,Guodong Guo,Baochang Zhang
标识
DOI:10.1109/tmm.2025.3599020
摘要
Cross-modality object detection aims to fuse complementary information from different modalities to improve model performance, which achieves a wider range of applications. However, traditional cross-modality fusion methods, based on CNN or Transformer, inadequately address the issue of pseudo-target information, which causes model attention dispersion to degrade object detection performance. In this paper, we investigate a novel cross-modality fusion approach by associating cross-modal features in a hidden state space based on an improved Mamba with a gating attention mechanism. We propose the Fusion-Mamba Block(FMB), designed to map cross-modal features into a hidden state space for interaction, thereby refining the model’s attention on true target areas and enhancing overall performance. The FMB comprises two key modules: State Space Channel Swapping (SSCS) module, which facilitates the fusion of shallow features, and Dual State Space Fusion (DSSF) module, which enables deep fusion and effectively suppresses pseudo-target information within the hidden state space. Our proposed method outperforms state-of-the-art approaches, achieving improvements of 5.9%, 3.5% and 2.1% mAP on $M^{3}$FD, DroneVehicle and FLIR-Aligned, respectively. To the best of our knowledge, this work establishes a new baseline for cross-modality object detection, providing a robust foundation for future research in this area.
科研通智能强力驱动
Strongly Powered by AbleSci AI