计算机科学
精简计算指令集
计算
并行计算
指令集
GSM演进的增强数据速率
算法
人工智能
作者
Shihang Wang,Xingbo Wang,Zhiyuan Xu,Bingzhen Chen,Chenxi Feng,Qi Wang,Terry Tao Ye
标识
DOI:10.1109/tc.2024.3362060
摘要
Benefit from the custom instruction extension capabilities, RISC-V architecture can be optimized for many domain-specific applications. In this paper, we propose seven RISC-V SIMD (single instruction multiple data) custom instructions that can significantly optimize the convolution, activation and pool operations in CNN inference computation. More specifically, instruction CONV23 can greatly speed up the operation of F (2 × 2, 3 × 3). With the adoption of Winograd algorithm, the number of multiplications can be reduced from 36 to 16, and the execution time is also reduced from 140 to 21 clock cycles. These custom instructions can be executed in batch mode within the acceleration module where the immediate data can be reused, so the latency and energy overhead associated with excess memory accesses can be eliminated. Using inline assembler in C language, the custom instructions can be called and compiled together with C source code. A revised RISC-V processor, RI5CY-Accel is constructed on FPGA to accommodate these custom instructions. Revised LeNet-5, VGG16 and ResNet18 model; called LeNet-Accel, VGG16-Accel and ResNet18-Accel are also optimized based on RI5CY-Accel architecture. Benchmark experiments demonstrated that the inference of LeNet-Accel, VGG16-Accel and ResNet18-Accel based on RI5CY-Accel can greatly reduce the execution latency by over 76.6%, 88.8% and 87.1%, with the total energy consumption saving of 74.8%, 87.8% and 85.1% respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI