计算机科学
人工智能
互联网
一般化
分割
噪音(视频)
钥匙(锁)
训练集
深度学习
机器学习
计算机视觉
模式识别(心理学)
数据挖掘
图像(数学)
万维网
计算机安全
数学
数学分析
作者
Zhengqi Li,Noah Snavely
标识
DOI:10.1109/cvpr.2018.00218
摘要
Single-view depth prediction is a fundamental problem in computer vision. Recently, deep learning methods have led to significant progress, but such methods are limited by the available training data. Current datasets based on 3D sensors have key limitations, including indoor-only images (NYU), small numbers of training examples (Make3D), and sparse sampling (KITTI). We propose to use multi-view Internet photo collections, a virtually unlimited data source, to generate training data via modern structure-from-motion and multi-view stereo (MVS) methods, and present a large depth dataset called MegaDepth based on this idea. Data derived from MVS comes with its own challenges, including noise and unreconstructable objects. We address these challenges with new data cleaning methods, as well as automatically augmenting our data with ordinal depth relations generated using semantic segmentation. We validate the use of large amounts of Internet data by showing that models trained on MegaDepth exhibit strong generalization-not only to novel scenes, but also to other diverse datasets including Make3D, KITTI, and DIW, even when no images from those datasets are seen during training.
科研通智能强力驱动
Strongly Powered by AbleSci AI