计算机科学
可扩展性
变压器
窗口(计算)
滑动窗口协议
人工智能
数据挖掘
工程类
电气工程
电压
数据库
操作系统
作者
Pengzhen Ren,Changlin Li,Guangrun Wang,Yun Xiao,Qing Du,Xiaodan Liang,Xiaojun Chang
标识
DOI:10.1109/cvpr52688.2022.01168
摘要
Recently, a surge of interest in visual transformers is to reduce the computational cost by limiting the calculation of self-attention to a local window. Most current work uses a fixed single-scale window for modeling by default, ignoring the impact of window size on model performance. How-ever, this may limit the modeling potential of these window-based models for multi-scale information. In this paper, we propose a novel method, named Dynamic Window Vision Transformer (DW-ViT). The dynamic window strategy proposed by DW- ViT goes beyond the model that employs a fixed single window setting. To the best of our knowl-edge, we are the first to use dynamic multi-scale windows to explore the upper limit of the effect of window settings on model performance. In DW- ViT, multi-scale information is obtained by assigning windows of different sizes to different head groups of window multi-head self-attention. Then, the information is dynamically fused by assigning different weights to the multi-scale window branches. We con-ducted a detailed performance evaluation on three datasets, ImageNet-1K, ADE20K, and COCO. Compared with re-lated state-of-the-art (SoTA) methods, DW- ViT obtains the best performance. Specifically, compared with the current SoTA Swin Transformers [31], DW-ViT has achieved con-sistent and substantial improvements on all three datasets with similar parameters and computational costs. In addition, DW-ViT exhibits good scalability and can be easily inserted into any window-based visual transformers. 1 1 Code release: https://github.com/pzhren/DW-ViT. This work was done when the first author interned at Dark Matter AI..
科研通智能强力驱动
Strongly Powered by AbleSci AI