计算机科学
云计算
点云
对象(语法)
点(几何)
人工智能
目标检测
计算机视觉
分割
数学
几何学
操作系统
作者
Renzhong Qiao,Hongbing Ji,Zhigang Zhu,Wenbo Zhang
标识
DOI:10.1109/tcsvt.2024.3396870
摘要
LiDAR, as an excellent sensor, can provide positions, motion states, and other objective attribute information of objects in the 3D world. Inevitably, the inherent sparsity of point cloud and the problem of occlusion tend to cause incomplete semantic and geometry information of long-range small objects, posing challenges to 3D object detection. The multi-view models take advantage of the complementary information among bird's eye view (BEV), range view (RV), and other views to alleviate the above issues. However, most of the existing methods coarsely learn the views' features and neglect the learning of semantic information, which further leads to unsatisfactory detection performance. To this end, this paper proposes a Local-to-Global Semantic Learning Network (LGSLNet) for multi-view 3D object detection from point cloud. The proposed LGSLNet can effectively learn semantic information to explore the local semantics contained in various channels of RV features and to fuse them with BEV features. It has two branches with different backbones. In the BEV branch, the voxels quantized from the point cloud are extracted by sparse convolutional networks and compressed to BEV features. In the RV branch, a multi-scale backbone with semantic-aware convolution (SAC) is designed to learn the local semantic information of the RV. It allows for adaptation to the 3D location using the auxiliary network. In the fusion module, the bidirectional cross-view channel attention (Bi-CCA) is designed to compensate for the semantic information between multiple views and aggregate new RV and BEV features. Extensive experiments on the KITTI, ONCE, and nuScenes 3D object detection datasets demonstrate the superiority of our proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI