SACINet: Semantic-Aware Cross-Modal Interaction Network for Real-Time 3D Object Detection

计算机科学语义学（计算机科学）水准点（测量）特征（语言学）人工智能情态动词特征提取成对比较分割目标检测钥匙（锁）计算机视觉模式识别（心理学）语言学哲学化学计算机安全大地测量学高分子化学程序设计语言地理

作者

Ying Yang,Hui Yin,Aixin Chong,Jin Wan,Qing-Yi Liu

出处

期刊：IEEE transactions on intelligent vehicles [Institute of Electrical and Electronics Engineers]
日期：2023-01-01 卷期号：: 1-10 被引量：1

标识

DOI：10.1109/tiv.2023.3348099

摘要

LiDAR-Camera fusion-based 3D object detection is one of the main visual perception tasks in autonomous driving, facing the challenges of small targets and occlusions. Image semantics are beneficial for these issues, yet most existing methods applied semantics only in the cross-modal fusion stage to compensate for point geometric features, where the advantages of semantic information are not effectively explored. Further, the increased complexity of the network caused by introducing semantics is also a major obstacle to real-time. In this paper, we propose a Semantic-Aware Cross-modal Interaction Network(SACINet) to achieve real-time 3D object detection, which introduces high-level semantics into both key stages of image feature extraction and cross-modal fusion. Specifically, we design a Lightweight Semantic-aware Image Feature Extractor(LSIFE) to enhance semantic samplings of objects while reducing numerous parameters. Additionally, a Semantic-Modulated Cross-modal Interaction Mechanism(SMCIM) is proposed to stress semantic details in cross-modal fusion. This mechanism conducts a pairwise interactive fusion among geometric features, semantic-aware point-wise image features, and semantic-aware point-wise segmentation features by the designed Conditions Generation Network(CGN) and Semantic-Aware Point-wise Feature Modulation(SAPFM). Ultimately, we construct a real-time(25.2fps) 3D detector with minor parameters(23.79 MB), which can better achieve the trade-off between accuracy and efficiency. Comprehensive experiments on the KITTI benchmark illustrate that SACINet is effective for real-time 3D detection, especially on small and severely occluded targets. Further, we conduct semantic occupancy perception experiments on the latest nuScenes-Occupancy benchmark, which verifies the effectiveness of SMCIM.

求助该文献

最长约 10秒，即可获得该文献文件

SACINet: Semantic-Aware Cross-Modal Interaction Network for Real-Time 3D Object Detection

今日热心研友