现场可编程门阵列
计算机科学
卷积神经网络
人工神经网络
量化(信号处理)
吞吐量
硬件加速
嵌入式系统
门阵列
计算机硬件
编码器
深度学习
人工智能
算法
无线
操作系统
电信
作者
Jinzhou Zhang,Hui Zhang,Bingrui Zhao,Jiaxuan Liu,Xidong Zhou
摘要
With the rapid development of artificial intelligence, deep neural network (DNN) has been widely used in industrial defect detection, intelligent driving, medical research, etc. However, DNN is still limited in the implementation of edge computing and mobile devices due to its characteristics of high model complexity and high computing resource consumption. Therefore, we designed a neural network hardware accelerator based on Field Programmable Gate Array (FPGA) for printed circuit board (PCB) defect detection. In this paper, firstly, since structure re-parameterization can improve the network's accuracy without increasing the inference model's complexity, we introduce structure re-parameterization to improve the YOLOv2 model and propose RepYOLOv2. Secondly, a low-bit quantization method based on integer type is adopted to quantify the model data to 6-bit. Then a specific convolutional computing module and neural network hardware accelerator are designed according to the characteristics of the model. Experimental results on Xilinx ZCU102 FPGA show that the real-time processing speed of the system reaches 2.12 FPS, the throughput is 68.53 GOP/s, and the power consumption is only 1.12 W. Compared with similar work, better performance is obtained.
科研通智能强力驱动
Strongly Powered by AbleSci AI