计算机科学
点云
人工智能
计算机视觉
激光雷达
特征(语言学)
行人检测
目标检测
透视图(图形)
图像融合
体素
点(几何)
模式识别(心理学)
图像(数学)
行人
遥感
数学
语言学
哲学
几何学
运输工程
工程类
地质学
作者
Ke Wang,Tianqiang Zhou,Zhichuang Zhang,Tao Chen,Junlan Chen
标识
DOI:10.1016/j.engappai.2023.105951
摘要
The detection of small objects such as pedestrians still poses challenges to the LiDAR-based 3D object detection due to the sparseness and disorder of point clouds. Conversely, images from cameras can provide rich semantic information, which makes these small-sized objects easy to be detected. To take use of the advantages of both devices to achieve better 3D object detection, research on the fusion of LiDAR and camera information is now being conducted. The existing fusion methods between point clouds and image are normally weighed more on the point clouds. Hence the semantic information of images is not fully utilized. We propose a new fusion method named PVFusion to try to fuse more image features. We first divide each point into a separate perspective voxel and project the voxel onto the image feature maps. Then the semantic feature of the perspective voxel is fused with the geometric feature of the point. A 3D object detection model (PVF-DectNet) is designed using PVFusion. During training we employ the ground truth paste (GT-Paste) data augmentation and solve the occlusion problem caused by newly added object. The KITTI validation set is used to validate the PVF-DectNet, which shows 3.6% AP improvement over the other feature fusion methods in pedestrian detection. On the KITTI test set, the PVF-DectNet outperforms the other multi-modal SOTA methods by 2.2% AP in pedestrian detection. And PVFusion shows better detection performance for sparse point clouds than PointFusion in both car and pedestrian categories. As for 32 beams LiDAR scene, there are 4.2% AP increment in moderate difficulty car category and 5.2% mAP improvement in pedestrian category.
科研通智能强力驱动
Strongly Powered by AbleSci AI