人工智能
计算机视觉
计算机科学
单眼
特征(语言学)
目标检测
初始化
特征提取
模式识别(心理学)
哲学
语言学
程序设计语言
作者
Chongben Tao,Jiecheng Cao,Chen Wang,Zufeng Zhang,Zhen Gao
标识
DOI:10.1109/tcsvt.2023.3237579
摘要
Current monocular 3D object detection algorithms generally suffer from inaccurate depth estimation, which leads to reduction of detection accuracy. The depth error from image-to-image generation for the stereo view is insignificant compared with the gap in single-image generation. Therefore, a novel pseudo-monocular 3D object detection framework is proposed, which is called Pseudo-Mono. Particularly, stereo images are brought into monocular 3D detection. Firstly, stereo images are taken as input, then a lightweight depth predictor is used to generate the depth map of input images. Secondly, the left input images obtained from stereo camera are used as subjects, which generate enhanced visual feature and multi-scale depth feature by depth indexing and feature matching probabilities, respectively. Finally, sparse anchors set by the foreground probability maps and the multi-scale feature maps are used as reference points to find the suitable initialization approach of object query. The encoded visual feature is adopted to enhance object query for enabling deep interaction between visual feature and depth feature. Compared with popular monocular 3D object detection methods, Pseudo-Mono is able to achieve richer fine-grained information without additional data input. Extensive experimental results on the datasets of KITTI, NuScenes, and MS-COCO demonstrate the generalizability and portability of the proposed method. The effectiveness and efficiency of Pseudo-Mono have been demonstrated by extensive ablation experiments. Experiments on a real vehicle platform have shown that the proposed method maintains high performance in complex real-world environments.
科研通智能强力驱动
Strongly Powered by AbleSci AI