构象异构
计算机科学
人工智能
特征(语言学)
模式识别(心理学)
卷积神经网络
特征学习
代表(政治)
分割
特征提取
化学
语言学
哲学
有机化学
分子
政治
政治学
法学
作者
Zhiliang Peng,Wei Huang,Shanzhi Gu,Lingxi Xie,Yaowei Wang,Jianbin Jiao,Qixiang Ye
标识
DOI:10.1109/iccv48922.2021.00042
摘要
Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations. Within visual transformer, the cascaded self-attention modules can capture long-distance feature dependencies but unfortunately deteriorate local feature details. In this paper, we propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning. Conformer roots in the Feature Coupling Unit (FCU), which fuses local features and global representations under different resolutions in an interactive fashion. Conformer adopts a concurrent structure so that local features and global representations are retained to the maximum extent. Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet. On MSCOCO, it outperforms ResNet-101 by 3.7% and 3.6% mAPs for object detection and instance segmentation, respectively, demonstrating the great potential to be a general backbone network. Code is available at github.com/pengzhiliang/Conformer.
科研通智能强力驱动
Strongly Powered by AbleSci AI