计算机科学
调试
并行计算
交错
高级合成
现场可编程门阵列
内存带宽
架空(工程)
嵌入式系统
内存计算
辅助存储器
内存管理
计算机体系结构
交错存储器
计算机硬件
操作系统
半导体存储器
作者
Nicholas Beckwith,Jialiang Zhang,Jing Jane Li
标识
DOI:10.1109/fccm53951.2022.9786121
摘要
FPGAs have been introduced to datacenters as a mainstream computing device to accelerate a wide range of data-intensive applications when paired with heterogeneous memory. Leveraging High-Level Synthesis (HLS), application engineers not only can accelerate their applications but also the development time of designing, debugging and validating accelerators. However, existing HLS flows do not have effective support for emerging memory devices such as Intel’s Optane DC Persistent Memory Modules (Optane DCPMM) – a storage-class memory in a DIMM form factor. In fact, we observe that some HLS kernels can at best utilize only one-tenth of the total memory bandwidth of Optane DCPMM.To remedy the poor performance of HLS with Optane DCPMM, we augment the existing HLS external memory interface with zero-overhead, application-specific address mapping capabilities. The proposed scheme utilizes both fine-grained information from variable access patterns and coarse-grained variable-interleaving information to select an optimal hybrid address mapping for high memory bandwidth utilization, compared to a default fixed address mapping in existing HLS. Furthermore, our scheme is compatible with existing tool flows such as the Intel FPGA SDK for OpenCL and Vitis Application Flow to maintain a low adoption barrier. We observe that by using our proposed address mapping scheme and interface, we achieve 10× speedup on a diverse set of benchmarks including merge join, matrix multiplication and convolution without any additional hardware cost.
科研通智能强力驱动
Strongly Powered by AbleSci AI