实现(概率)
卷积神经网络
建筑
计算机体系结构
现场可编程门阵列
门阵列
计算机科学
领域(数学)
嵌入式系统
计算机硬件
人工智能
数学
地理
统计
考古
纯数学
作者
Anand Kumar Mukhopadhyay,Sampurna Majumder,Indrajit Chakrabarti
标识
DOI:10.1016/j.compeleceng.2021.107628
摘要
A detailed methodology for implementing a fully connected (FC) deep neural network (DNN) and convolutional neural network (CNN) inference system on a field programming gate array (FPGA) is presented. Minimal computational units are used for the DNN. For the CNN, systolic array (SA) architecture endowed with parallel processing potential is utilized. Algorithmic analysis determines the optimum memory requirement for the fixed point trained parameters. The size of the trained parameters and the available memory on the target FPGA device govern the choice of on-chip memory to utilize. Experimental results indicate that the choice of block over distributed memory saves ≈ 62% look-up-tables (LUTs) for the DNN ([784-512-512-10]), and the choice of distributed over block memory saves ≈ 30% block random access memory (BRAM) for the LeNet-5 CNN unit. This study provides insights for developing FPGA-based digital systems for applications requiring DNN and CNN. • A methodological approach for mapping DNN and CNN inference unit on an FPGA. • Serial processing for DNN and parallel systolic array processing for CNN. • Detailed illustration of the VLSI modules used to build the inference architecture. • Choice of distributed/block on-chip memory for the inference architectures and comparison with related works.
科研通智能强力驱动
Strongly Powered by AbleSci AI