计算机科学
人工智能
分割
语义学(计算机科学)
计算机视觉
一致性(知识库)
像素
多任务学习
单眼
任务(项目管理)
管理
经济
程序设计语言
作者
Junning Zhang,Qunxing Su,Bo Tang,Cheng Wang,Yining Li
标识
DOI:10.1109/tnnls.2021.3107362
摘要
Multitask joint learning technology continues gaining more attention as a paradigm shift and has shown promising performance in many applications. Depth estimation and semantic understanding from monocular images emerge as a challenging problem in computer vision. While the other joint learning frameworks establish the relationship between the semantics and depth from stereo pairs, the lack of learning camera motion renders the frameworks that fail to model the geometric structure of the image scene. We make a further step in this article by proposing a multitask learning method, namely DPSNet, which can jointly perform depth and camera pose estimation and semantic scene segmentation. Our core idea for depth and camera pose prediction is that we present the rigid semantic consistency loss to overcome the limitation of moving pixels from image reconstruction technology and further infer the segmentation of moving instances based on them. In addition, our proposed model performs semantic segmentation by reasoning the geometric correspondences between the pixel semantic outputs and the semantic labels at multiscale resolutions. Experiments on open-source datasets and a video dataset captured on a micro-smart car show the effectiveness of each component of DPSNet, and DPSNet achieves state-of-the-art results in all three tasks compared with the best popular methods. All our models and code are available at https://github.com/jn-z/DPSNet: Multitask Learning Using Geometry Reasoning for Scene Depth and semantics.
科研通智能强力驱动
Strongly Powered by AbleSci AI