现场可编程门阵列
量化(信号处理)
计算机科学
修剪
推论
人工智能
计算机体系结构
机器学习
算法
嵌入式系统
农学
生物
作者
Zhang Xian,Guoqing Xiao,Mingxing Duan,Yuedan Chen,Kenli Li
标识
DOI:10.1109/tsusc.2024.3382157
摘要
Convolutional neural networks (CNNs) are widely utilized in intelligent edge computing applications such as computational vision and image processing. However, as the number of layers of the CNN model increases, the number of parameters and computations gets larger, making it increasingly challenging to accelerate in edge computing applications. To effectively adapt to the tradeoff between the speed and accuracy of CNNs inference for smart applications. This paper proposes an FPGA-based adaptive CNNs inference accelerator synergistically utilizing filter pruning, fixed-point parameter quantization, and multi-computing unit parallelism called APPQ-CNN. First, the article devises a hybrid pruning algorithm based on the L1- norm and APoZ to measure the filter impact degree and a configurable parameter quantization fixed-point computing architecture instead of floating-point architecture. Then, design a cascade of the CNN pipelined kernel architecture and configurable multiple computation units. Finally, conduct extensive performance exploration and comparison experiments on various real and synthetic datasets. With negligible accuracy loss, the speed performance of our accelerator APPQ-CNN compares with current state-of-the-art FPGA-based accelerators PipeCNN and OctCNN by 2.15x and 1.91x, respectively. Furthermore, APPQCNN provides settable fixed-point quantization bit-width parameters, filter pruning rate, and multiple computation unit counts to cope with practical application performance requirements in edge computing.
科研通智能强力驱动
Strongly Powered by AbleSci AI