计算机科学
卷积神经网络
一元运算
分割
人工智能
推论
变压器
卷积(计算机科学)
模式识别(心理学)
矩形
图像分割
计算机视觉
算法
人工神经网络
数学
工程类
电压
组合数学
电气工程
几何学
作者
Haoran Duan,Yang Long,Shidong Wang,Haofeng Zhang,Chris G. Willcocks,Ling Shao
标识
DOI:10.1109/tpami.2022.3233482
摘要
It is uncertain whether the power of transformer architectures can complement existing convolutional neural networks. A few recent attempts have combined convolution with transformer design through a range of structures in series, where the main contribution of this paper is to explore a parallel design approach. While previous transformed-based approaches need to segment the image into patch-wise tokens, we observe that the multi-head self-attention conducted on convolutional features is mainly sensitive to global correlations and that the performance degrades when these correlations are not exhibited. We propose two parallel modules along with multi-head self-attention to enhance the transformer. For local information, a dynamic local enhancement module leverages convolution to dynamically and explicitly enhance positive local patches and suppress the response to less informative ones. For mid-level structure, a novel unary co-occurrence excitation module utilizes convolution to actively search the local co-occurrence between patches. The parallel-designed Dynamic Unary Convolution in Transformer (DUCT) blocks are aggregated into a deep architecture, which is comprehensively evaluated across essential computer vision tasks in image-based classification, segmentation, retrieval and density estimation. Both qualitative and quantitative results show our parallel convolutional-transformer approach with dynamic and unary convolution outperforms existing series-designed structures.
科研通智能强力驱动
Strongly Powered by AbleSci AI