计算机科学
编码(内存)
人工智能
分割
RGB颜色模型
编码
代表(政治)
模式识别(心理学)
图像分割
计算机视觉
编码(集合论)
语义学(计算机科学)
特征提取
序列(生物学)
判别式
可视化
自然语言处理
尺度空间分割
特征学习
钥匙(锁)
解码方法
图像(数学)
建筑
任务分析
深度学习
语义映射
作者
Bo-Wen Yin,Jiao-Long Cao,Dan Xu,Ming-Ming Cheng,Qibin Hou
标识
DOI:10.1109/tpami.2026.3658114
摘要
We explore the potential of pretrain-and-finetune manner on the RGB-D semantic segmentation to solve the common mismatch problem in this field. Specifically, we present DFormer++, a novel RGB-D pretrain-and-finetune framework to learn transferable representations for RGB-D semantic segmentation. This paper has two vital innovations. 1) Framework perspective: Different from the existing methods that finetune RGB pretrained backbone to the RGB-D scenes, we pretrain the backbone using image-depth pairs from ImageNet-1K, and hence the model is endowed with the capacity to encode RGB-D representations; 2) Architecture perspective: Our model comprises a sequence of RGB-D attention blocks, which are tailored for encoding both RGB and depth information through a novel attention mechanism. Our DFormer++ avoids the mismatched encoding of the 3D geometry relationships in depth maps by RGB pretrained backbones, which widely lies in previous works but has not been resolved. Meanwhile, the tailored architecture greatly reduces redundant parameters for encoding RGB-D data and achieves efficient and accurate perception. Experimental results show that our DFormer++ achieves new cutting-edge performance on three popular RGB-D semantic segmentation benchmarks. Our code is available at: https://github.com/VCIP-RGBD/DFormer.
科研通智能强力驱动
Strongly Powered by AbleSci AI