云计算
计算机科学
管道(软件)
并行计算
加速度
协处理器
架空(工程)
计算机硬件
操作系统
物理
经典力学
作者
Fengbin Tu,Yiqi Wang,Zihan Wu,Ling Liang,Yufei Ding,Bongjin Kim,Leibo Liu,Shaojun Wei,Yuan Xie,Shouyi Yin
标识
DOI:10.1109/isscc42614.2022.9731762
摘要
Many computing-in-memory (CIM) processors have been proposed for edge deep learning (DL) acceleration. They usually rely on analog CIM techniques to achieve high-efficiency NN inference with low-precision INT multiply-accumulation (MAC) support [1]. Different from edge DL, cloud DL has higher accuracy requirements for NN inference and training, which demands extra support for high-precision floating-point (FP) MAC. As shown in Fig. 15.5.1, applying CIM techniques to cloud DL has three main limitations: 1) FP MAC has tightly coupled exponent alignment and INT mantissa MAC. Implementing complex exponent alignment in memory will harm CIM's direct accumulation structure and reduce efficiency. 2) FP MAC's energy is dominated by INT mantissa MAC. Further acceleration on CIM-based INT MAC is critical for processor efficiency. 3) Previous cloud DL processors usually have separate FP and INT engines, but only activate one engine at once [2], which causes high area overhead and low resource utilization.
科研通智能强力驱动
Strongly Powered by AbleSci AI