计算机科学
瓶颈
调度(生产过程)
可扩展性
并行计算
人工神经网络
计算
分布式计算
内存管理
嵌入式系统
算法
半导体存储器
计算机硬件
人工智能
数学优化
操作系统
数学
作者
Seokho Lee,Younghyun Lee,Hyejun Kim,Taehoon Kim,Yongjun Park
标识
DOI:10.23919/date56975.2023.10137105
摘要
Precision-scalable neural processing units (PSNPUs) efficiently provide native support for quantized neural networks. However, with the recent advancements of deep neural networks, PSNPUs are affected by a severe memory bottleneck owing to the need to perform an extreme number of simple computations simultaneously. In this study, we first analyze whether the memory bottleneck issue can be solved using conventional neural processing unit scheduling techniques. Subsequently, we introduce new capacity-aware memory allocation and block-level scheduling techniques to minimize the memory bottleneck. Compared with the baseline, the new method achieves up to 2.26× performance improvements by substantially relieving the memory pressure of low-precision computations without hardware overhead.
科研通智能强力驱动
Strongly Powered by AbleSci AI