人工智能
计算机科学
单眼
判别式
计算机视觉
频道(广播)
卷积神经网络
边距(机器学习)
基本事实
水准点(测量)
特征(语言学)
模式识别(心理学)
图像扭曲
一般化
深度图
图像(数学)
数学
机器学习
地质学
计算机网络
语言学
哲学
大地测量学
数学分析
作者
Zhuping Wang,Xinke Dai,Zhanyu Guo,Chao Huang,Hao Zhang
标识
DOI:10.1109/tnnls.2022.3221416
摘要
Understanding 3-D scene geometry from videos is a fundamental topic in visual perception. In this article, we propose an unsupervised monocular depth and camera motion estimation framework using unlabeled monocular videos to overcome the limitation of acquiring per-pixel ground-truth depth at scale. The photometric loss couples the depth network and pose network together and is essential to the unsupervised method, which is based on warping nearby views to target using the estimated depth and pose. We introduce the channelwise attention mechanism to dig into the relationship between channels and introduce the spatialwise attention mechanism to utilize the inner-spatial relationship of features. Both of them applied in depth networks can better activate the feature information between different convolutional layers and extract more discriminative features. In addition, we apply the Sobel boundary to our edge-aware smoothness for more reasonable accuracy, and clearer boundaries and structures. All of these help to close the gap with fully supervised methods and show high-quality state-of-the-art results on the KITTI benchmark and great generalization performance on the Make3D dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI