计算机科学
加法器
深度学习
管道(软件)
计算机体系结构
计算机工程
点积
现场可编程门阵列
延迟(音频)
计算机硬件
并行计算
嵌入式系统
人工智能
程序设计语言
几何学
电信
数学
作者
Qiong Li,Chao Fang,Zhongfeng Wang
标识
DOI:10.1109/iscas46773.2023.10182007
摘要
Posit has been a promising alternative to the IEEE-754 floating point format for deep learning applications due to its better trade-off between dynamic range and accuracy. However, hardware implementation of posit arithmetic requires further exploration, especially for the dot-product operations dominated in deep neural networks (DNNs). It has been implemented by either the combination of multipliers and an adder tree or cascaded fused multiply-add units, leading to poor computational efficiency and excessive hardware overhead. To address this issue, we propose an open-source posit dot-product unit, namely PDPU, that facilitates resource-efficient and high-throughput dot-product hardware implementation. PDPU not only features the fused and mixed-precision architecture that eliminates redundant latency and hardware resources, but also has a fine-grained 6-stage pipeline, improving computational efficiency. A configurable PDPU generator is further developed to meet the diverse needs of various DNNs for computational accuracy. Experimental results evaluated under the 28nm CMOS process show that PDPU reduces area, latency, and power by up to 43%, 64%, and 70%, respectively, compared to the existing implementations. Hence, PDPU has great potential as the computing core of posit-based accelerators for deep learning applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI