计算机科学
安全性令牌
人工智能
熵(时间箭头)
变压器
推论
关联数组
边缘设备
修剪
内存占用
机器学习
数据挖掘
模式识别(心理学)
生物
农学
操作系统
物理
云计算
电压
量子力学
计算机安全
作者
Junzhu Mao,Yazhou Yao,Zeren Sun,Xingguo Huang,Fumin Shen,Heng Tao Shen
标识
DOI:10.1109/tmm.2023.3265159
摘要
Due to its significant capability of modeling long-range dependencies, vision transformer (ViT) has achieved promising success in both holistic and occluded person re-identification (Re-ID) tasks. However, the inherent problems of transformers such as the huge computational cost and memory footprint are still two unsolved issues that will block the deployment of ViT based person Re-ID models on resource-limited edge devices. Our goal is to reduce both the inference complexity and model size without sacrificing the comparable accuracy on person Re-ID, especially for tasks with occlusion. To this end, we propose a novel attention map guided (AMG) transformer pruning method, which removes both redundant tokens and heads with the guidance of the attention map in a hardware-friendly way. We first calculate the entropy in the key dimension and sum it up for the whole map, and the corresponding head parameters of maps with high entropy will be removed for model size reduction. Then we combine the similarity and first-order gradients of key tokens along the query dimension for token importance estimation and remove redundant key and value tokens to further reduce the inference complexity. Comprehensive experiments on Occluded DukeMTMC and Market-1501 demonstrate the effectiveness of our proposals. For example, our proposed pruning strategy on ViT-Base enjoys 29.4% FLOPs savings with 0.2% drop on Rank-1 and 0.4% improvement on mAP, respectively. Code and models have been made available at https://github.com/NUST-Machine-Intelligence-Laboratory/AMG .
科研通智能强力驱动
Strongly Powered by AbleSci AI