计算机科学
现场可编程门阵列
循环展开
卷积神经网络
量化(信号处理)
修剪
还原(数学)
计算机硬件
MNIST数据库
数字信号处理
嵌入式系统
协处理器
专用集成电路
计算机体系结构
并行计算
人工神经网络
人工智能
算法
几何学
数学
编译程序
农学
生物
程序设计语言
作者
Ahmed Elgohary,Omar A. Nasr
标识
DOI:10.1109/niles59815.2023.10296773
摘要
Convolutional Neural Networks (CNNs) have gained significant popularity in recent years due to their exceptional performance in classification tasks. However, the large number of computational operations that are needed for these models can limit their practicality. Field Programmable Gate Arrays (FPGAs) can be a promising solution for accelerating CNNs as they give the ability to create custom hardware designs. In this paper, we present an efficient hardware implementation for the CNN processor, in which a generic processor is designed and implemented to accelerate CNNs on hardware with the minimum number of operations and hardware resources. Our target is to reduce the number of needed memory and Digital Signal Processor blocks (DSP) to allow full implementation in FPGA on-chip memory. Multiple optimization techniques were used, such as loop unrolling, weights pruning, and quantization, to achieve our target. The processor is evaluated on the LeNet5 architecture using the MNIST dataset on the Vertix-7 FPGA, which showed a 17.9x reduction in weights size and very competitive performance-to-resources efficiency when compared with literature.
科研通智能强力驱动
Strongly Powered by AbleSci AI