计算机科学
单眼
人工智能
特征(语言学)
编码器
卷积神经网络
光学(聚焦)
计算机视觉
代表(政治)
一般化
过程(计算)
解码方法
特征提取
编码(内存)
语义学(计算机科学)
模式识别(心理学)
深度图
推论
方案(数学)
人工神经网络
对比度(视觉)
语义特征
网络体系结构
深度学习
背景(考古学)
卷积(计算机科学)
作者
Peihong Wu,Mengxiao Yin,Pengfei Lai,Zhiqiang Su,Feng Zhan,Bei Hua
标识
DOI:10.1109/ijcnn64981.2025.11229233
摘要
Self-supervised monocular depth estimation has garnered widespread attention because it does not require hard-to-obtain depth labels during training. Many existing studies have focused on the design of depth encoders, often neglecting the potential of decoders, which results in decoders that struggle to utilize the multi-scale features extracted by the encoder fully, lack the ability to capture the features comprehensively, and also fall short in recovering local details. To address these issues, this paper proposes a lightweight self-supervised monocular depth estimation architecture called MD-Mono. MD-Mono employs a hybrid depth encoder combining Convolutional Neural Networks (CNNs) and Transformers, aiming to capture both local features and global semantic information. In the depth decoder, we propose an Adaptive Depth Focus (ADF) module and an Implicit Detail Enhancement (IDE) module. The ADF module adaptively adjusts each stage of the decoding process according to the input features, effectively integrating and utilizing multi-scale features. The IDE module implicitly maps the input to a high-dimensional, nonlinear feature space, capturing more detailed feature information for recovering local details. The synergy of these two modules enables our architecture to achieve a semantically richer and spatially more accurate representation with fewer parameters. Experimental results show that MD-Mono significantly outperforms Monodepth2 in terms of accuracy and exhibits good generalization ability on the Make3D and DrivingStereo datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI