计算机科学
现场可编程门阵列
卷积神经网络
核(代数)
卷积(计算机科学)
过程(计算)
延迟(音频)
硬件加速
深度学习
计算机硬件
计算机体系结构
吞吐量
嵌入式系统
计算机工程
并行计算
人工神经网络
人工智能
操作系统
电信
数学
组合数学
无线
作者
Ardian Dwi C,Trio Adiono,Nana Sutisna
出处
期刊:International Conference on Electrical Engineering and Informatics
日期:2021-10-12
标识
DOI:10.1109/iceei52609.2021.9611126
摘要
Developments of hardware accelerators for deep learning has increased rapidly due to the demand for flexibility to be applied to various deep learning architectures. The architecture that is widely marketed in recent years is GPU-based architecture where developers meet many difficulties for accelerators to be applied to different architectures. In this paper, the author will design an FPGA-based accelerator that will be used to handle processes in the convolution layer of Convolutional Neural Network (CNN). The system is designed with a base clock of 10 ns capable of providing a throughput of 1Gbyte/sec. The test results using a kernel with a size of 3 × 3 completed with an iteration time of 2683.92 us with a latency of 7930 ns. Furthermore, testing using a 2×2 kernel was completed with an iteration time of 2643 us with a latency of 5930 ns. The use of accelerators to complete the convolution process is proven to speed up the process of completing the convolution process with time difference of up to ±1400 us compared to the process carried out in Matlab.
科研通智能强力驱动
Strongly Powered by AbleSci AI