计算机科学
人工智能
目标检测
水准点(测量)
模态(人机交互)
情态动词
概括性
计算机视觉
点云
模式
比例(比率)
对象(语法)
模式识别(心理学)
化学
物理
大地测量学
量子力学
高分子化学
心理治疗师
地理
心理学
社会科学
社会学
作者
Bonan Ding,Jin Xie,Jing Nie
标识
DOI:10.1109/icassp49357.2023.10095671
摘要
Multi-modal 3D object detection that classifies and locates objects in 3D space by combining point-clouds captured by lidars and RGB images captured by cameras, serves as the basis for autonomous driving. Most of the existing methods aggregate features from point-clouds and images by plain element-wise additions or multiplications. Although these methods improve detection accuracy, such simple operations have difficulties in balancing both modalities. Further, the multi-level features from images also suffer from imbalance problems in receptive fields. To address the above problems, we propose two novel networks: cross-modality balance network (CMN) and cross-scale balance network (CSN). CMN utilizes cross-modality attention mechanisms to balance the importance and receptive field of two modalities. CSN employs cross-scale attention mechanisms to reduce the imbalance in multi-level features. Experiments are performed on the challenging benchmark: KITTI. The experimental results show consistent improvements in different 3D object detection frameworks, which verifies the effectiveness and generality of our proposed networks.
科研通智能强力驱动
Strongly Powered by AbleSci AI