人工智能
RGB颜色模型
计算机视觉
计算机科学
特征(语言学)
突出
对象(语法)
目标检测
模式识别(心理学)
认知
特征提取
心理学
语言学
哲学
神经科学
作者
Huizhi Wang,Hui Guo,Xiongli Chai,Baoyang Mu,Feng Shao
标识
DOI:10.1109/tim.2025.3600718
摘要
Salient Object Detection (SOD) are widely used in quality inspection scenarios, such as rail surface detection.Recent studies have proven that incorporating complementary information like depth and thermal images is conducive to SOD. Effectively leveraging the advantages of each modality while eliminating inter-modality noise in multi-level fusion has been a research hotspot. Most existing works use convolution and attention mechanisms for modality interaction but overlook semantic similarity during fusion, leading to poor performance in some challenging scenarios. In this paper, inspired by psychology studies on human vision system (HVS), we propose a Dynamic Feature Integration Network (DFINet) that simulates the correlation mechanism between human attention and semantics for RGB-D and RGB-T SOD. Specifically, to better capture the modal-specific features in semantics, we first employ a multi-granularity-based pre-segmentation method, namely the Pre-segmentation Injection Module (PIM), to enhance and preserve the modal-specific features at different layers of the backbone network. Then, a Dynamic Feature Fusion Module (DFFM) is devised to simulate the mechanism of HVS where specific semantic regions gain more attention. This module evaluates the semantic similarity between different modal features and determines the weights for each modal feature in the fusion. The encoded multi-modal features are fed into a staircase decoder which can retain deep semantic information to boost accuracy. Extensive experiments on RGB-D and RGB-T SOD datasets validate that our proposed cognition-inspired framework has excellent and competitive performance with good generalization and robustness.
科研通智能强力驱动
Strongly Powered by AbleSci AI