计算机科学
残余物
现场可编程门阵列
可扩展性
硬件加速
深度学习
卷积神经网络
延迟(音频)
计算机体系结构
计算机工程
吞吐量
嵌入式系统
人工神经网络
人工智能
计算机硬件
算法
数据库
无线
操作系统
电信
作者
Yufei Ma,Minkyu Kim,Yu Cao,Sarma Vrudhula,Jae-sun Seo
标识
DOI:10.1109/iscas.2017.8050344
摘要
This work presents an efficient hardware accelerator design of deep residual learning algorithms, which have shown superior image recognition accuracy (>90% top-5 accuracy on ImageNet database). Two key objectives of the acceleration strategy are to (1) maximize resource utilization and minimize data movements, and (2) employ scalable and reusable computing primitives to optimize physical design under hardware constraints. Furthermore, we present techniques for efficient integration and communication of these primitives in deep residual convolutional neural networks (CNNs) that exhibit complex, non-uniform layer connections. The proposed hardware accelerator efficiently implements state-of-the-art ResNet-50/152 algorithms on Arria-10 FPGA, demonstrating 285.1/315.5 GOPS of throughput and 27.2/71.7 ms of latency, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI