计算机科学
现场可编程门阵列
吞吐量
推论
卷积神经网络
计算
计算机硬件
加速
重新使用
计算机工程
Cyclone(编程语言)
嵌入式系统
并行计算
计算机体系结构
人工智能
算法
操作系统
生物
无线
生态学
作者
Truong Quang Vinh,Dinh Viet Hai
标识
DOI:10.1142/s0218126621501930
摘要
Convolutional neural network (CNN) is one of the most promising algorithms that outweighs other traditional methods in terms of accuracy in classification tasks. However, several CNNs, such as VGG, demand a huge computation in convolutional layers. Many accelerators implemented on powerful FPGAs have been introduced to address the problems. In this paper, we present a VGG-based accelerator which is optimized for a low-cost FPGA. In order to optimize the FPGA resource of logic element and memory, we propose a dedicated input buffer that maximizes the data reuse. In addition, we design a low resource processing engine with the optimal number of Multiply Accumulate (MAC) units. In the experiments, we use VGG16 model for inference to evaluate the performance of our accelerator and achieve a throughput of 38.8[Formula: see text]GOPS at a clock speed of 150[Formula: see text]MHz on Intel Cyclone V SX SoC. The experimental results show that our design is better than previous works in terms of resource efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI