计算机科学
现场可编程门阵列
数据流
内存占用
卷积神经网络
量化(信号处理)
设计空间探索
吞吐量
边缘设备
计算
计算机工程
并行计算
计算机硬件
算法
嵌入式系统
人工智能
云计算
电信
操作系统
无线
作者
Cecilia Latotzke,Tim Ciesielski,Tobias Gemmeke
标识
DOI:10.1109/fpl57034.2022.00061
摘要
Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or activation word-length reduction. Thereby, layer-wise mixed-precision quantization allows for more efficient results while inflating the design space. In this work, we present an in-depth quantitative methodology to efficiently explore the design space considering the limited hardware resources of a given FPGA. Our holistic exploration approach vertically traverses the various design entry levels from the architectural down to the logic level, and laterally covers optimization from processing elements to dataflow for an efficient mixed-precision CNN accelerator. Our resulting hardware accelerators implement truly mixed-precision operations that enable efficient execution of layer-wise and channel-wise quantized CNNs. Mapping feed-forward and identity-shortcut-connection mixed-precision CNNs result in competitive accuracy-throughout trade-offs: 245 frames/s with 87.48% Top-5 accuracy for ResNet-18 and 92.9% Top-5 accuracy with 1.13 TOps/s for ResNet-152, respectively. Thereby, the required memory footprint for parameters is reduced by 4.9 × and 9.4 × compared to the respective floating-point baseline.
科研通智能强力驱动
Strongly Powered by AbleSci AI