双精度浮点格式
静态随机存取存储器
浮点型
宏
位(键)
计算机科学
并行计算
点(几何)
浮点单位
单位(环理论)
领域(数学分析)
计算机硬件
计算科学
算术
操作系统
数学
程序设计语言
数学教育
数学分析
计算机安全
几何学
作者
An Guo,Xi Chen,Fangyuan Dong,Xingyu Pu,Dongqi Li,Jingmin Zhang,Xueshan Dong,Hui Gao,Yiran Zhang,Bo Wang,Jun Yang,Xin Si
出处
期刊:IEEE Journal of Solid-state Circuits
[Institute of Electrical and Electronics Engineers]
日期:2024-03-25
卷期号:59 (9): 3032-3044
被引量:13
标识
DOI:10.1109/jssc.2024.3375359
摘要
With the rapid advancement of artificial intelligence (AI), computing-in-memory (CIM) structure is proposed to improve energy efficiency (EF). However, previous CIMs often rely on INT8 data types, which pose challenges when addressing more complex networks, larger datasets, and increasingly intricate tasks. This work presents a double-bit 6T static random-access memory (SRAM)-based floating-point CIM macro using: 1) a cell array with double-bitcells (DBcells) and floating-point computing units (FCUs) to improve throughput without the sacrifice of inference accuracy; 2) an FCU with high-bit full-precision multiply cell (HFMC) and low-bit approximate-calculation multiply cell (LAMC) to reduce internal bandwidth and area cost; 3) a CIM macro architecture with FP processing circuits to support both floating-point MAC (FP-MAC) and integer (INT)-multiplication and accumulation (MAC); 4) a new ShareFloatv2 data type to map floating point in CIM array; and 5) a lookup table (LUT)-based Tensorflow training method to improve inference accuracy. A fabricated 28-nm 64-kb digital-domain SRAM-CIM macro achieved the best EF (31.6 TFLOPS/W) and the highest area efficiency (2.05 TFLOPS/mm $^{2})$ for FP-MAC with Brain Float16 (BF16) IN/W/OUT on three AI tasks: classification@CIFAR100, detection@COCO, and segmentation@VOC2012.
科研通智能强力驱动
Strongly Powered by AbleSci AI