计算机科学
人工智能
黑森矩阵
计算机视觉
量化(信号处理)
数学
应用数学
作者
Weixing Zhang,Zhuang Tian,Nan Lin,Cong Yang,Yongxia Chen
标识
DOI:10.1117/1.jei.34.1.013009
摘要
In recent years, vision transformers (ViTs) have made significant breakthroughs in computer vision and have demonstrated great potential in large-scale models. However, the quantization methods for convolutional neural network models do not perform well on ViTs models, leading to a significant decrease in accuracy when applied to ViTs models. We extend the quantization parameter optimization method based on the Hessian matrix and apply it to the quantization of the LayerNorm module in ViT models. This approach reduces the impact of quantization on task accuracy for the LayerNorm module and enables more comprehensive quantization of ViT models. To achieve fast quantization of ViTs models, we propose a quantization framework specifically designed for ViTs models: Hessian matrix–aware post-training quantization for vision transformers (HAPTQ). The experimental results on various models and datasets demonstrate that our HAPTQ method, after quantizing the LayerNorm module of various ViT models, can achieve lossless quantization (with an accuracy drop of less than 1%) in ImageNet classification tasks. Specifically, the HAPTQ method achieves 85.81% top-1 accuracy on the ViT-L model.
科研通智能强力驱动
Strongly Powered by AbleSci AI