计算机科学
卷积神经网络
变压器
人工智能
编码器
突出
模式识别(心理学)
目标检测
杠杆(统计)
迭代法
数据挖掘
算法
工程类
操作系统
电气工程
电压
作者
Junbin Yuan,Aiqing Zhu,Qingzhen Xu,Kanoksak Wattanachote,Yongyi Gong
标识
DOI:10.1109/tcsvt.2023.3321190
摘要
Capturing sufficient global context and rich spatial structure information is critical for dense prediction tasks. Convolutional Neural Network (CNN) is particularly adept at modeling fine-grained local features, while Transformer excels at modeling global context information. It is evident that CNN and Transformer exhibit complementary characteristics. Exploring the design of a network, that efficiently fuses these two models to leverage their strengths fully and achieve more accurate detection, represents a promising and worthwhile research topic. In this paper, we introduce a novel CNN-Transformer Iterative Fusion Network (CTIF-Net) for salient object detection. It efficiently combines CNN and Transformer to achieve superior performance by using a parallel dual encoder structure and a feature iterative fusion module. Firstly, CTIF-Net extracts features from the image using the CNN and the Transformer, respectively. Secondly, two feature convertors and a feature iterative fusion module are employed to combine and iteratively refine the two sets of features. The experimental results on multiple SOD datasets show that CTIF-Net outperforms 17 state-of-the-art methods, achieving higher performance in various mainstream evaluation metrics such as F-measure, S-measure, and MAE value. The code will be publicly available.
科研通智能强力驱动
Strongly Powered by AbleSci AI