计算机科学
栏(排版)
并行计算
加速
延迟(音频)
瓶颈
矩阵乘法
高效能源利用
行和列空间
访问时间
嵌入式系统
排
计算机硬件
计算机网络
帧(网络)
电信
物理
量子力学
数据库
电气工程
量子
工程类
作者
Chengning Wang,Dan Feng,Wei Tong,Jingning Liu
标识
DOI:10.1109/dac56929.2023.10247700
摘要
Emerging cross-point memory can in-situ perform vector-matrix multiplication (VMM) for energy-efficient scientific computation. However, parasitic-capacitance-induced row charging and discharging latency is a major performance bottleneck of subarray VMM. We propose a memory-timing-compliant bulk VMM processing-using-memory design with row access and column access co-optimization from rethinking of read access commands and µ-op timing. We propose row-level-parallelism-adaptive timing termination mechanism to reduce tail latency of tRCD and tRP by exploiting row nonlinear charging and bulk-interleaved row-column-cooperative VMM access mechanism to reduce tRAS and overlap CL without increasing column ADC precision. Evaluations show that our design can achieve 5.03× performance speedup compared with an aggressive baseline.
科研通智能强力驱动
Strongly Powered by AbleSci AI