计算机科学
卷积神经网络
变压器
人工智能
机器学习
模式识别(心理学)
数据挖掘
工程类
电气工程
电压
作者
C. Jiang,Chen Wang,Hanxiang Zhang,Richard Li
出处
期刊:Research Square - Research Square
日期:2024-06-06
标识
DOI:10.21203/rs.3.rs-4447366/v1
摘要
Abstract Visual Transformers (VTs) are increasingly popular in computer vision due to their robust global modeling. However, they do not have the same learning advantages as Convolutional Neural Networks (CNNs), which can be trained effectively with less data. This paper presents a simple "query attention" module that enhances VTs for small datasets. This module combines channel and spatial data to improve results without the need for pre-training. The query attention shows strong learning abilities and makes the model more efficient. Even with a reduced VTs backbone, the module improves performance over other methods. We tested this on four small datasets (CIFAR10/100, CINIC10, and Tiny-ImageNet) and found that our approach performed better. For instance, by using fewer layers in ViT with the query attention, we reduced parameters by 20\% and increased accuracy by 4.26\% on Tiny-ImageNet.
科研通智能强力驱动
Strongly Powered by AbleSci AI