A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection

现场可编程门阵列吞吐量计算机科学目标检测对象（语法）功率（物理）人工智能计算机视觉计算机体系结构计算机硬件嵌入式系统模式识别（心理学）电信无线物理量子力学

作者

Duy Thanh Nguyen,Tuan Nghia Nguyen,Hyun Kim,Hyuk‐Jae Lee

出处

期刊：IEEE Transactions on Very Large Scale Integration Systems [Institute of Electrical and Electronics Engineers]
日期：2019-04-01 卷期号：27 (8): 1861-1873 被引量：368

标识

DOI：10.1109/tvlsi.2019.2905242

摘要

Convolutional neural networks (CNNs) require numerous computations and external memory accesses. Frequent accesses to off-chip memory cause slow processing and large power dissipation. For real-time object detection with high throughput and power efficiency, this paper presents a Tera-OPS streaming hardware accelerator implementing a you-only-look-once (YOLO) CNN. The parameters of the YOLO CNN are retrained and quantized with the PASCAL VOC data set using binary weight and flexible low-bit activation. The binary weight enables storing the entire network model in block RAMs of a field-programmable gate array (FPGA) to reduce off-chip accesses aggressively and, thereby, achieve significant performance enhancement. In the proposed design, all convolutional layers are fully pipelined for enhanced hardware utilization. The input image is delivered to the accelerator line-by-line. Similarly, the output from the previous layer is transmitted to the next layer line-by-line. The intermediate data are fully reused across layers, thereby eliminating external memory accesses. The decreased dynamic random access memory (DRAM) accesses reduce DRAM power consumption. Furthermore, as the convolutional layers are fully parameterized, it is easy to scale up the network. In this streaming design, each convolution layer is mapped to a dedicated hardware block. Therefore, it outperforms the "one-size-fits-all" designs in both performance and power efficiency. This CNN implemented using VC707 FPGA achieves a throughput of 1.877 tera operations per second (TOPS) at 200 MHz with batch processing while consuming 18.29 W of on-chip power, which shows the best power efficiency compared with the previous research. As for object detection accuracy, it achieves a mean average precision (mAP) of 64.16% for the PASCAL VOC 2007 data set that is only 2.63% lower than the mAP of the same YOLO network with full precision.

求助该文献

最长约 10秒，即可获得该文献文件

A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection

今日热心研友