查阅表格
现场可编程门阵列
计算机科学
卷积神经网络
量化(信号处理)
可扩展性
数字信号处理
乘数(经济学)
计算机硬件
计算机工程
算法
人工智能
数据库
经济
宏观经济学
程序设计语言
作者
Bingrui Zhao,Yaonan Wang,Hui Zhang,Jinzhou Zhang,Yurong Chen,Yimin Yang
标识
DOI:10.1109/tim.2023.3324357
摘要
To address the challenge of deploying Convolutional Neural Networks (CNNs) on edge devices with limited resources, this paper presents an effective 4-bit quantization scheme for CNN and proposes a DSP-free multiplier solution for deploying quantized neural networks on Field-Programmable Gate Array (FPGA) devices. Specifically, we first introduce a Threshold-Aware Quantization (TAQ) method with a mixed rounding strategy to compress the scale of the model while maintaining the accuracy of the original full-precision model. Experimental results demonstrate that the proposed quantization method retains a high classification accuracy for 4-bit quantized CNN models. Additionally, we propose a Compact Lookup-based Multiplier (CLM) design that replaces numerical multiplication with a lookup table of precomputed 4-bit multiplication results, leveraging LUT6 resources instead of scarce DSP blocks to improve the scalability of FPGA to implement multiplication-intensive CNN algorithms. The proposed 4-bit CLM only consumes 13 LUT6 resources, surpassing existing LUT-based multipliers in terms of resource consumption. The proposed CNN quantization and CLM multiplier scheme effectively save FPGA resource consumption for FPGA implementation on image classification tasks, providing strong support for deep learning algorithms in unmanned systems, industrial inspection, and other relevant vision and measurement scenarios running on DSP-constrained edge devices.
科研通智能强力驱动
Strongly Powered by AbleSci AI