计算机科学
德拉姆
瓶颈
吞吐量
深度学习
嵌入式系统
计算机体系结构
带宽(计算)
计算机硬件
人工智能
操作系统
无线
计算机网络
作者
Seongju Lee,Kyuyoung Kim,Sanghoon Oh,Joonhong Park,Gi-Moon Hong,Dongyoon Ka,Kyu‐Dong Hwang,Jeongje Park,Kyeong-Pil Kang,Jungyeon Kim,Junyeol Jeon,Nahsung Kim,Yongkee Kwon,Kornijcuk Vladimir,Woojae Shin,Jongsoon Won,Minkyu Lee,Hyunha Joo,Haerang Choi,Jaewook Lee
标识
DOI:10.1109/isscc42614.2022.9731711
摘要
With advances in deep-neural-network applications the increasingly large data movement through memory channels is becoming inevitable: specifically, RNN and MLP applications are memory bound and the memory is the performance bottleneck [1]. DRAM featuring processing in memory (PIM) significantly reduces data movement [1]–[4], and the system performance is enhanced by the large internal parallel bank bandwidth. Among DRAM-based PIM proposals, [3] is near commercialization, but the required HBM technology may prevent it from being applied to other applications due to its high cost [5]. In this situation, an accelerator-in-memory (AiM) based on GDDR6 may be applicable: it has a relatively low-cost, is compatible with GDDR6 interface, and is designed to accelerate deep-learning (DL) applications. AiM offers a peak throughput of 1 TFLOPS with processing units (PUs) with a speed of 1 GHz utilizing the characteristics of GDDR6 with a speed of 16Gb/s. It can also support many applications as it has various activation functions. This paper first looks at the AiM architecture and the supported command set for DL operations. Next, the DL operations in the PU and supported activation functions are described. Finally, we present evaluation results of DL behavior of AiM at the package and the system level.
科研通智能强力驱动
Strongly Powered by AbleSci AI