Learning Discriminative Cross-Modality Features for RGB-D Saliency Detection

RGB颜色模型人工智能判别式模式识别（心理学）计算机科学相关性模态（人机交互）特征（语言学）计算机视觉串联（数学）像素光学（聚焦）分割数学组合数学语言学光学物理哲学几何学

作者

Fengyun Wang,Jinshan Pan,Shoukun Xu,Jinhui Tang

出处

期刊：IEEE transactions on image processing [Institute of Electrical and Electronics Engineers]
日期：2022-01-01 卷期号：31: 1285-1297 被引量：88

链接

nih.govdoi.org

标识

DOI：10.1109/tip.2022.3140606

摘要

How to explore useful information from depth is the key success of the RGB-D saliency detection methods. While the RGB and depth images are from different domains, a modality gap will lead to unsatisfactory results for simple feature concatenation. Towards better performance, most methods focus on bridging this gap and designing different cross-modal fusion modules for features, while ignoring explicitly extracting some useful consistent information from them. To overcome this problem, we develop a simple yet effective RGB-D saliency detection method by learning discriminative cross-modality features based on the deep neural network. The proposed method first learns modality-specific features for RGB and depth inputs. And then we separately calculate the correlations of every pixel-pair in a cross-modality consistent way, i.e., the distribution ranges are consistent for the correlations calculated based on features extracted from RGB (RGB correlation) or depth inputs (depth correlation). From different perspectives, color or spatial, the RGB and depth correlations end up at the same point to depict how tightly each pixel-pair is related. Secondly, to complemently gather RGB and depth information, we propose a novel correlation-fusion to fuse RGB and depth correlations, resulting in a cross-modality correlation. Finally, the features are refined with both long-range cross-modality correlations and local depth correlations to predict salient maps. In which, the long-range cross-modality correlation provides context information for accurate localization, and the local depth correlation keeps good subtle structures for fine segmentation. In addition, a lightweight DepthNet is designed for efficient depth feature extraction. We solve the proposed network in an end-to-end manner. Both quantitative and qualitative experimental results demonstrate the proposed algorithm achieves favorable performance against state-of-the-art methods.

求助该文献

最长约 10秒，即可获得该文献文件

Learning Discriminative Cross-Modality Features for RGB-D Saliency Detection

今日热心研友