极线几何
计算机科学
人工智能
变压器
计算机视觉
一致性(知识库)
匹配(统计)
特征(语言学)
背景(考古学)
特征匹配
航程(航空)
特征提取
数学
图像(数学)
地质学
物理
哲学
古生物学
复合材料
统计
电压
材料科学
量子力学
语言学
作者
Jie Zhu,Bo Peng,Wanqing Li,Haifeng Shen,Qingming Huang,Jianjun Lei
摘要
This article proposes a network, referred to as Multi-View Stereo TRansformer (MVSTR) for depth estimation from multi-view images. By modeling long-range dependencies and epipolar geometry, the proposed MVSTR is capable of extracting dense features with global context and 3D consistency, which are crucial for reliable matching in multi-view stereo (MVS). Specifically, to tackle the problem of the limited receptive field of existing CNN-based MVS methods, a global-context Transformer module is designed to establish intra-view long-range dependencies so that global contextual features of each view are obtained. In addition, to further enable features of each view to be 3D consistent, a 3D-consistency Transformer module with an epipolar feature sampler is built, where epipolar geometry is modeled to effectively facilitate cross-view interaction. Experimental results show that the proposed MVSTR achieves the best overall performance on the DTU dataset and demonstrates strong generalization on the Tanks & Temples benchmark dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI