3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond

计算机科学 RGB颜色模型卷积神经网络人工智能目标检测模式识别（心理学）对象（语法）突出计算机视觉

作者

Qian Chen,Zhenxi Zhang,Yanye Lu,Keren Fu,Qijun Zhao

出处

期刊：IEEE transactions on neural networks and learning systems [Institute of Electrical and Electronics Engineers]
日期：2022-09-13 卷期号：35 (3): 4309-4323 被引量：29

链接

nih.govdoi.org

标识

DOI：10.1109/tnnls.2022.3202241

摘要

RGB-depth (RGB-D) salient object detection (SOD) recently has attracted increasing research interest, and many deep learning methods based on encoder–decoder architectures have emerged. However, most existing RGB-D SOD models conduct explicit and controllable cross-modal feature fusion either in the single encoder or decoder stage, which hardly guarantees sufficient cross-modal fusion ability. To this end, we make the first attempt in addressing RGB-D SOD through 3-D convolutional neural networks. The proposed model, named RD3D, aims at prefusion in the encoder stage and in-depth fusion in the decoder stage to effectively promote the full integration of RGB and depth streams. Specifically, RD3D first conducts prefusion across RGB and depth modalities through a 3-D encoder obtained by inflating 2-D ResNet and later provides in-depth feature fusion by designing a 3-D decoder equipped with rich back-projection paths (RBPPs) for leveraging the extensive aggregation ability of 3-D convolutions. Toward an improved model RD3D+, we propose to disentangle the conventional 3-D convolution into successive spatial and temporal convolutions and, meanwhile, discard unnecessary zero padding. This eventually results in a 2-D convolutional equivalence that facilitates optimization and reduces parameters and computation costs. Thanks to such a progressive-fusion strategy involving both the encoder and the decoder, effective and thorough interactions between the two modalities can be exploited and boost detection accuracy. As an additional boost, we also introduce channel-modality attention and its variant after each path of RBPP to attend to important features. Extensive experiments on seven widely used benchmark datasets demonstrate that RD3D and RD3D+ perform favorably against 14 state-of-the-art RGB-D SOD approaches in terms of five key evaluation metrics. Our code will be made publicly available at https://github.com/PPOLYpubki/RD3D .

求助该文献

3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond

今日热心研友