计算机科学
现场可编程门阵列
量化(信号处理)
交通标志识别
卷积神经网络
硬件加速
计算机工程
加速
计算机硬件
计算
残余物
嵌入式系统
实时计算
算法
并行计算
人工智能
交通标志
符号(数学)
数学分析
数学
作者
Jaemyung Kim,Jin-Ku Kang,Yongwoo Kim
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2022-01-01
卷期号:10: 84626-84634
被引量:12
标识
DOI:10.1109/access.2022.3197906
摘要
Traffic sign recognition (TSR) technology allows the vehicle to recognize road signs through a camera and use it for driving. For traffic safety, TSR is one of the core technologies constituting advanced driver assistance systems (ADAS), and several researches have been studied. The advent of convolutional neural networks (CNNs) has opened up new possibilities in automotive environments, especially for ADAS. However, deploying a real-time TSR application in resource-constrained ADAS is challenging because most CNNs require high computing resources and memory usage. To address this problem, some works have been studied to consider optimization in embedded platforms, but existing works used many hardware resources or showed low computation performance. In this paper, we propose a low-cost CNN-based real-time TSR hardware accelerator. Firstly, we extend a novel hardware-friendly quantization method to reduce computational complexity. The quantization method can reconstruct the CNN so that all operations, including the skip connection path of residual blocks, use only integer arithmetic and reduce the computational overhead by replacing the quantization affine mapping process with a shift operation. Secondly, the proposed hardware accelerator applied two parallelization strategies to balance real-time inference and resource consumption. In addition, we present a simple and effective hardware design scheme that handles the skip connection path of residual blocks. This design scheme can optimize the dataflow of the skip connection path and reduce additional internal memory usage. Experimental results show that the reconstructed fully integer-based CNN only requires 24M integer operations (IOPs) and possesses a model size of 0.17MB. Compared with the previous work, the proposed CNN model size was reduced by ×105, and the number of operations was reduced by ×58. In addition, the proposed CNN can achieve a TSR accuracy of 99.07%, which is the highest accuracy among CNN-based TSR works implemented on embedded platforms. The proposed hardware accelerator achieves a computation performance of 960 MOPS and a frame rate of 40 FPS when implemented on a Xilinx ZC706 SoC. Consequently, this work improves by ×11.87 and ×36.7 on computation performance and frame rate compared to the previous work.
科研通智能强力驱动
Strongly Powered by AbleSci AI