单眼
人工智能
计算机视觉
计算机科学
对象(语法)
目标检测
模式识别(心理学)
作者
Guohua Liu,Haiyang Lian,Changrui Guo
标识
DOI:10.1088/1361-6501/ad50f6
摘要
Abstract To accurately obtain 3D information, the correct use of depth data is crucial. Compared with radar-based methods, detecting objects in 3D space in a single image is extremely challenging due to the lack of depth cues. However, monocular 3D object detection provides a more economical solution. Traditional monocular 3D object detection methods often rely on geometric constraints, such as key points, object shape relationships and 3D to 2D optimization, to address the inherent lack of depth information. However, these methods still make it challenging to extract rich information directly from depth estimation for fusion. To fundamentally enhance the ability of monocular 3D object detection, we propose a monocular 3D object detection network based on depth information enhancement. The network learns object detection and depth estimation tasks simultaneously through a unified framework, integrates depth features as auxiliary information into the detection branch, and then constrains and enhances them to obtain better spatial representation. To this end, we introduce a new cross-modal fusion strategy, which realizes a more reasonable fusion of cross-modal information by exploring redundant, complementary information and their interactions between RGB features and depth features. Extensive experiments on the KITTI dataset show that our method can significantly improve the performance of monocular 3D object detection.
科研通智能强力驱动
Strongly Powered by AbleSci AI