编码器
突出
变压器
人工智能
计算机科学
特征(语言学)
计算机视觉
RGB颜色模型
模式识别(心理学)
工程类
电气工程
电压
语言学
哲学
操作系统
作者
Shuang Wu,Guangjian Zhang,Xuefeng Liu
标识
DOI:10.1016/j.imavis.2024.105039
摘要
The Transformer structure has achieved excellent performance in a wide range of applications in computer vision, and Swin-Transformer also shows strong feature representation capabilities. On this basis, we proposed a fusion model SwinSOD for RGB salient object detection. This model used a Swin-Transformer as the encoder to extract hierarchical features, was driven by a multi-head attention mechanism to bridge the gap between hierarchical features, progressively fused adjacent layer feature information under the guidance of global information, and refined the boundaries of saliency objects through the feedback information. Specifically, the Swin-Transformer encoder extracted multi-level features and then recalibrated the channels to optimize intra-layer channel features. The feature fusion module realized feature fusion between each layer under the guidance of global information. In order to clarify the fuzzy boundaries, the second stage feature fusion achieved edge refinement under the guidance of feedback information. The proposed model outperforms state-of-the-art models on five popular SOD datasets, demonstrating the advanced performance of this network. Code released: https://github.com/user-wu/SwinSOD.
科研通智能强力驱动
Strongly Powered by AbleSci AI