计算机科学
情态动词
变压器
突出
人工智能
模式识别(心理学)
计算机视觉
电压
工程类
电气工程
化学
高分子化学
作者
Pengfei Lyu,Xiaosheng Yu,Jianning Chi,Hao Wu,Chengdong Wu,Jagath C. Rajapakse
标识
DOI:10.1109/tip.2025.3564821
摘要
Exploring complementary information between RGB and thermal/depth modalities is crucial for bi-modal salient object detection (BSOD). However, the distinct characteristics of different modalities often lead to large differences in information distributions. Existing models, which rely on convolutional operations or plug-and-play attention mechanisms, struggle to address this issue. To overcome this challenge, we rethink the relationship between information complementarity and long-range relevance, and propose a uniform broad-view Twins Transformer Network (TwinsTNet) for accurate BSOD. Specifically, to efficiently fuse bi-modal information, we first design the Cross-Modal Federated Attention (CMFA), which mines complementary cues across modalities through elementwise global dependency. Second, to ensure accurate modality fusion, we propose the Semantic Consistency Attention Loss, which supervises the co-attention feature in CMFA using the groundtruth-generated attention map. Additionally, existing BSOD models lack the exploration of inter-layer interactions, for which we propose the Cross-Scale Retracing Attention (CSRA), which retrieves query-relevant information from stacked features of all previous layers, enabling flexible cross-layer interactions. The cooperation between CMFA and CSRA mitigates inductive bias in both modality and layer dimensions, enhancing TwinsTNets representational capability. Extensive experiments demonstrate that TwinsTNet outperforms twenty-two existing state-of-the-art models on ten BSOD benchmark datasets. The code is available at: https://github.com/JoshuaLPF/TwinsTNet.
科研通智能强力驱动
Strongly Powered by AbleSci AI