现场可编程门阵列
计算机科学
深度学习
数据流
计算机体系结构
硬件加速
高效能源利用
门阵列
嵌入式系统
延迟(音频)
FPGA原型
计算机硬件
并行计算
人工智能
电信
电气工程
工程类
作者
Yonggen Li,Xin Li,Haibin Shen,Jicong Fan,Yanfeng Xu,Kejie Huang
出处
期刊:ACM Transactions on Reconfigurable Technology and Systems
[Association for Computing Machinery]
日期:2024-01-15
摘要
Field Programmable Gate Array (FPGA) is a versatile and programmable hardware platform, which makes it a promising candidate for accelerating Deep Neural Networks (DNNs). However, FPGA’s computing energy efficiency is low due to the domination of energy consumption by interconnect data movement. In this paper, we propose an all-digital Compute-In-Memory FPGA architecture for deep learning acceleration. Furthermore, we present a bit-serial computing circuit of the Digital CIM core for accelerating vector-matrix multiplication (VMM) operations. A Network-CIM-Deployer ( NCIMD ) is also developed to support automatic deployment and mapping of DNN networks. NCIMD provides a user-friendly API of DNN models in Caffe format. Meanwhile, we introduce a Weight-Stationary (WS) dataflow and describe the method of mapping a single layer of the network to the CIM array in the architecture. We conduct experimental tests on the proposed FPGA architecture in the field of Deep Learning (DL), as well as in non-DL fields, using different architectural layouts and mapping strategies. We also compare the results with the conventional FPGA architecture. The experimental results show that compared to the conventional FPGA architecture, the energy efficiency can achieve a maximum speedup of 16.1 ×, while the latency can decrease up to \(40\% \) in our proposed CIM FPGA architecture.
科研通智能强力驱动
Strongly Powered by AbleSci AI