计算机科学
窗口(计算)
维数(图论)
卷积(计算机科学)
感受野
编码(集合论)
变压器
特征提取
特征(语言学)
人工智能
模式识别(心理学)
算法
数学
人工神经网络
集合(抽象数据类型)
工程类
哲学
电气工程
操作系统
电压
程序设计语言
纯数学
语言学
作者
Qiang Chen,Qiman Wu,Jian Wang,Qinghao Hu,Tao Hu,Errui Ding,Jian Cheng,Jingdong Wang
标识
DOI:10.1109/cvpr52688.2022.00518
摘要
While local-window self-attention performs notably in vision tasks, it suffers from limited receptive field and weak modeling capability issues. This is mainly because it performs self-attention within non-overlapped windows and shares weights on the channel dimension. We propose Mix-Former to find a solution. First, we combine local-window self-attention with depth-wise convolution in a parallel design, modeling cross-window connections to enlarge the receptive fields. Second, we propose bi-directional interactions across branches to provide complementary clues in the channel and spatial dimensions. These two designs are integrated to achieve efficient feature mixing among windows and dimensions. Our MixFormer provides competitive results on image classification with EfficientNet and shows better results than RegNet and Swin Transformer. Performance in downstream tasks outperforms its alternatives by significant margins with less computational costs in 5 dense prediction tasks on MS COCO, ADE20k, and LVIS. Code is available at https://github.com/PaddlePaddle/PaddleClas.
科研通智能强力驱动
Strongly Powered by AbleSci AI