符号
算术
数学
加法器
乘数(经济学)
离散数学
计算机科学
算法
域代数上的
纯数学
电信
经济
宏观经济学
延迟(音频)
作者
Mohammed Elbtity,Peyton Chandarana,Brendan Reidy,Jason K. Eshraghian,Ramtin Zand
出处
期刊:IEEE Transactions on Circuits and Systems I-regular Papers
[Institute of Electrical and Electronics Engineers]
日期:2022-09-23
卷期号:69 (12): 5135-5146
被引量:9
标识
DOI:10.1109/tcsi.2022.3206262
摘要
We propose an approximate tensor processing unit (APTPU), which includes two main components: (1) approximate processing elements (APEs) consisting of a low-precision multiplier and an approximate adder, and (2) pre-approximate units (PAUs) which are shared among the APEs in the APTPU's systolic array, functioning as the steering logic to pre-process the operands and feed them to the APEs. We conduct extensive experiments to evaluate the performance of the APTPU across various configurations and various workloads. The results show that the APTPU's systolic array achieves up to $5.2\times \textit {TOPS}/mm^{2}$ and $4.4\times \textit {TOPS}/W$ improvements compared to that of a conventional systolic array design. The comparison between the proposed APTPU and in-house TPU designs shows that we can achieve approximately $2.5\times $ and $1.2\times $ area and power reduction, respectively, while realizing comparable accuracy. Finally, a comparison with the state-of-the-art approximate systolic arrays shows that the APTPU can realize up to $1.58\times $ , $2\times $ , and $1.78\times $ , reduction in delay, power, and area, respectively, while using similar design specifications and synthesis constraints.
科研通智能强力驱动
Strongly Powered by AbleSci AI