计算机科学
瓶颈
可扩展性
覆盖
并行计算
冯·诺依曼建筑
分布式计算
并发
集合(抽象数据类型)
内存管理
逻辑门
计算机体系结构
嵌入式系统
算法
操作系统
程序设计语言
作者
Md. Humayun Kabir,Joshua Hollis,Atiyehsadat Panahi,Jason D. Bakos,Miaoqing Huang,Davıd L. Andrews
标识
DOI:10.1109/fccm57271.2023.00052
摘要
The increasing density of distributed BRAMs diffused throughout modern Field Programmable Gate Arrays (FP-GAs) is ideal for forming processor in/near memory architectures. This breaks the traditional von Neumann memory bottleneck limiting concurrency and degrading energy efficiency. Ideally, processing density should scale linearly with BRAM capacity, and clock frequencies should be set by the read/write access times of the BRAM. In this paper, we present a PIM overlay that achieves these goals. We observe an improvement of performance by 2.25 x, logic resource utilization by 2 x, and accumulation delay by 17 x compared to prior published work.
科研通智能强力驱动
Strongly Powered by AbleSci AI