现场可编程门阵列
卷积神经网络
计算机科学
人工神经网络
建筑
二进制数
加速度
计算机体系结构
并行计算
硬件加速
嵌入式系统
计算机硬件
人工智能
算术
数学
艺术
物理
经典力学
视觉艺术
作者
Mengfei Ji,Zaid Al-Ars,Yu‐Chun Chang,Bao‐Lin Zhang
标识
DOI:10.1142/s0218126624501706
摘要
In this paper, we present a fully pipelined and semi-parallel channel convolutional neural network hardware accelerator structure. This structure can trade off the compute time and the hardware utilization, allowing the accelerator to be layer pipelined without the need for fully parallelizing the input and output channels. A parallel strategy is applied to reduce the time gap in transferring the output results between different layers. The parallelism can be decided based on the hardware resources on the target FPGA. We use this structure to implement a binary ResNet18 based on the neural architecture search strategy, which can increase the accuracy of manually designed binary convolutional neural networks. Our optimized binary ResNet18 can achieve a Top-1 accuracy of 60.5% on the ImageNet dataset. We deploy this ResNet18 hardware implementation on an Alphadata 9H7 FPGA, connected with an OpenCAPI interface, to demonstrate the hardware capabilities. Depending on the amount of parallelism used, the latency can range from 1.12 to 6.33 ms, with a corresponding throughput of 4.56 to 0.71 TOPS for different hardware utilization, with a 200 MHz clock frequency. Our best latency is [Formula: see text] lower and our best throughput is [Formula: see text] higher compared to the best previous works. The code for our implementation is open-source and publicly available on GitHub at https://github.com/MFJI/NASBRESNET .
科研通智能强力驱动
Strongly Powered by AbleSci AI