计算机科学
数据流
软件可移植性
并行计算
加速
编译程序
操作数
瓶颈
计算机体系结构
计算机硬件
嵌入式系统
程序设计语言
作者
Zhengrong Wang,Christopher Liu,Tony Nowatzki
标识
DOI:10.1109/lca.2022.3203064
摘要
Although in-memory computing is promising to alleviate the data movement bottlenecks by parallelizing computation across memory bitlines, key challenges from its unique execution model remain unsolved: Automatically parallelizing sequential programs; Dynamically managing and aligning data in transposed layout required for bit-serial logic; Mixing in/near-memory computing. These challenges should be solved transparently to maintain portability without exposing hardware details to programmers. In this work, we introduce a novel intermediate representation – tensor dataflow graph (tDFG) – with tensor nodes representing the spatially unrolled data across bitlines, and explicit move nodes to align operands in the same bitline, which helps the compiler optimize for massive parallelism and data layout. To maintain transparency and portability, we directly embed tDFG in the ISA, which is lowered into bit-serial operations at runtime to hide the hardware details. Evaluated on cycle-accurate simulator across various data-processing workloads, our approach achieves 4.5× speedup and 52% traffic reduction over a state-of-the-art near-memory computing technique.
科研通智能强力驱动
Strongly Powered by AbleSci AI