计算机科学
瓶颈
加速
管道(软件)
计算机体系结构
内存带宽
并行计算
德拉姆
软件
程序设计范式
嵌入式系统
操作系统
计算机硬件
程序设计语言
作者
Xinfeng Xie,Peng Gu,Yufei Ding,Dimin Niu,Hongzhong Zheng,Yuan Xie
摘要
With the growing number of data-intensive workloads, GPU, which is the state-of-the-art single-instruction-multiple-thread (SIMT) processor, is hindered by the memory bandwidth wall. To alleviate this bottleneck, previously proposed 3D-stacking near-bank computing accelerators benefit from abundant bank-internal bandwidth by bringing computations closer to the DRAM banks. However, these accelerators are specialized for certain application domains with simple architecture data paths and customized software mapping schemes. For general-purpose scenarios, lightweight hardware designs for diverse data paths, architectural supports for the SIMT programming model, and end-to-end software optimizations remain challenging. To address these issues, we propose Memory-centric Processing Unit (MPU), the first SIMT processor based on 3D-stacking near-bank computing architecture. First, to realize diverse data paths with small overheads, MPU adopts a hybrid pipeline with the capability of offloading instructions to near-bank compute-logic. Second, we explore two architectural supports for the SIMT programming model, including a near-bank shared memory design and a multiple activated row-buffers enhancement. Third, we present an end-to-end compilation flow for MPU to support CUDA programs. To fully utilize MPU’s hybrid pipeline, we develop a backend optimization for the instruction offloading decision. The evaluation results of MPU demonstrate 3.46× speedup and 2.57× energy reduction compared with an NVIDIA Tesla V100 GPU on a set of representative data-intensive workloads.
科研通智能强力驱动
Strongly Powered by AbleSci AI