计算机科学
乘法(音乐)
乘数(经济学)
吞吐量
过程(计算)
符号
加速
算术
浮点型
算法
点(几何)
计算机硬件
并行计算
数学
电信
几何学
组合数学
经济
无线
宏观经济学
操作系统
作者
Wei Mao,Kai Li,Quan Cheng,Liuyao Dai,Boyu Li,Xinang Xie,He Li,Longyang Lin,Hao Yu
标识
DOI:10.1109/tvlsi.2021.3128435
摘要
There is an emerging need to design configurable accelerators for the high-performance computing (HPC) and artificial intelligence (AI) applications in different precisions. Thus, the floating-point (FP) processing element (PE), which is the key basic unit of the accelerators, is necessary to meet multiple-precision requirements with energy-efficient operations. However, the existing structures by using high-precision-split (HPS) and low-precision-combination (LPC) methods result in low utilization rate of the multiplication array and long multiterm processing period, respectively. In this article, a configurable FP multiple-precision PE design is proposed with the LPC structure. Half precision, single precision, and double precision are supported. The 100% multiplier utilization rate of the multiplication array for all precisions is achieved with improved speed in the comparison and summation process. The proposed design is realized in a 28-nm process with 1.429-GHz clock frequency. Compared with the existing multiple-precision FP methods, the proposed structure achieves 63% and 88% area-saving performance for FP16 and FP32 operations, respectively. The $4\times $ and $20\times $ maximum throughput rates are obtained when compared with fixed FP32 and FP64 operations. Compared with the previous multiple-precision PEs, the proposed one achieves the best energy-efficiency performance with 975.13 GFLOPS/W.
科研通智能强力驱动
Strongly Powered by AbleSci AI