计算机科学
特征(语言学)
人工智能
融合
光学(聚焦)
特征提取
目标检测
模式识别(心理学)
对象(语法)
编码(集合论)
传感器融合
转化(遗传学)
图像融合
特征模型
计算机视觉
信息融合
特征检测(计算机视觉)
特征学习
匹配(统计)
特征选择
组分(热力学)
数据挖掘
深度学习
钥匙(锁)
堆积
编码(内存)
作者
Hao Lei,Lina Xu,Chang Liu,Yanni Dong
标识
DOI:10.48550/arxiv.2506.21018
摘要
Effective deep feature extraction via feature-level fusion is crucial for multimodal object detection. However, previous studies often involve complex training processes that integrate modality-specific features by stacking multiple feature-level fusion units, leading to significant computational overhead. To address this issue, we propose a new fusion detection baseline that uses a single feature-level fusion unit to enable high-performance detection, thereby simplifying the training process. Based on this approach, we propose a lightweight attention-guided self-modulation feature fusion network (LASFNet), which introduces a novel attention-guided self-modulation feature fusion (ASFF) module that adaptively adjusts the responses of fusion features at both global and local levels based on attention information from different modalities, thereby promoting comprehensive and enriched feature generation. Additionally, a lightweight feature attention transformation module (FATM) is designed at the neck of LASFNet to enhance the focus on fused features and minimize information loss. Extensive experiments on three representative datasets demonstrate that, compared to state-of-the-art methods, our approach achieves a favorable efficiency-accuracy trade-off, reducing the number of parameters and computational cost by as much as 90% and 85%, respectively, while improving detection accuracy (mAP) by 1%-3%. The code will be open-sourced at https://github.com/leileilei2000/LASFNet.
科研通智能强力驱动
Strongly Powered by AbleSci AI