分割
融合
人工智能
计算机科学
估计
点(几何)
计算机视觉
模式识别(心理学)
数学
工程类
语言学
几何学
哲学
系统工程
作者
Hatem Ibrahem,Ahmed Salem,Hyun‐Soo Kang
标识
DOI:10.1109/tiv.2024.3370930
摘要
Depth estimation is an important task in autonomous driving, and usually needs special types of sensors or multiple cameras. In this paper, we propose a novel approach to monocular depth estimation based on two other cheaper annotation tasks: semantic segmentation and prediction of a single vanishing point without the need for ground truth depth data. In a Manhattan-world assumption with a single vanishing point, only one vanishing point exists and represents the end of the scene extension on the z-axis. Depending on semantic segmentation prediction, we set hand-crafted rules to determine the depth of each pixel in the scene depending on its label and its spatial position with regard to the vanishing point. We train two convolutional neural networks (CNNs): a semantic segmentation CNN and a vanishing point prediction CNN. We then fuse the results obtained from the two networks using the hand-crafted rules, which are defined based on single-view geometry rules by taking into consideration the label of the pixel and the nature of the object obtained by the segmentation model. Extensive experiments were done using the KITTI and Cityscapes benchmark datasets. The proposed model achieves impressive performance in semantic segmentation (mean intersection over union of 82.20%) and vanishing point estimation (mean absolute error of 1.87). Monocular depth estimation achieved a relative absolute error of 0.070 with the KITTI dataset and 0.289 with the Cityscapes dataset, outperforming many state-of-the-art methods in depth estimation and semantic segmentation at 10 frames per second.
科研通智能强力驱动
Strongly Powered by AbleSci AI