PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing

计算机科学 数据流 瓶颈 领域(数学分析) 并行计算 计算机硬件 对数 边缘设备 人工神经网络 浮点型 能量(信号处理) 操作数 计算机工程 算法 嵌入式系统 人工智能 数学 操作系统 云计算 数学分析 统计
作者
Yang Wang,Dazheng Deng,Leibo Liu,Shaojun Wei,Shouyi Yin
出处
期刊:IEEE Transactions on Circuits and Systems I-regular Papers [Institute of Electrical and Electronics Engineers]
卷期号:69 (10): 4042-4055 被引量:12
标识
DOI:10.1109/tcsi.2022.3184115
摘要

Edge device deep neural network (DNN) training is practical to improve model adaptivity for unfamiliar datasets while avoiding privacy disclosure and huge communication cost. Nevertheless, apart from feed-forward (FF) as inference, DNN training still requires back-propagation (BP) and weight gradient (WG), introducing power-consuming floating-point computing requirements, hardware underutilization, and energy bottleneck from excessive memory access. This paper proposes a DNN training processor named PL-NPU to solve the above challenges with three innovations. First, a posit-based logarithm-domain processing element (PE) adapts to various training data requirements with a low bit-width format and reduces energy by transferring complicated arithmetics into simple logarithm domain operation. Second, a reconfigurable inter-intra-channel-reuse dataflow dynamically adjusts the PE mapping with a regrouping omega network to improve the operands reuse for higher hardware utilization. Third, a pointed-stake-shaped codec unit adaptively compresses small values to variable-length data format while compressing large values to fixed-length 8b posit format, reducing the memory access for breaking the training energy bottleneck. Simulated with 28nm CMOS technology, the proposed PL-NPU achieves a maximum frequency of 1040MHz with 343mW and 5.28mm $\mathbf {^{2}}$ . The peak energy efficiency is 3.87TFLOPS/W for 0.6V at 60MHz. Compared with the state-of-the-art training processor, PL-NPU reaches $3.75\times $ higher energy efficiency and offers $1.68\times $ speedup when training ResNet18.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
沈亮发布了新的文献求助10
1秒前
斯文梦寒完成签到 ,获得积分10
1秒前
隐形曼青应助啊笑笑采纳,获得10
2秒前
Ava应助阿橘采纳,获得10
3秒前
YifanWang应助季春九采纳,获得30
3秒前
Chloe发布了新的文献求助10
3秒前
星辰大海应助易槐采纳,获得10
4秒前
6秒前
朴实寻真发布了新的文献求助10
6秒前
6秒前
9秒前
11秒前
13秒前
13秒前
13秒前
14秒前
Li应助liudayue采纳,获得30
15秒前
15秒前
科研通AI5应助mjr采纳,获得10
16秒前
阿橘发布了新的文献求助10
17秒前
Yuy发布了新的文献求助10
19秒前
20秒前
浪客完成签到 ,获得积分10
20秒前
Ava应助xh采纳,获得10
20秒前
21秒前
bubble发布了新的文献求助10
21秒前
曾泳钧完成签到,获得积分10
24秒前
汉堡包应助himes采纳,获得10
24秒前
科研通AI5应助夏蓉采纳,获得10
24秒前
小兔子完成签到 ,获得积分10
24秒前
wh发布了新的文献求助10
25秒前
25秒前
CHOW完成签到,获得积分10
26秒前
26秒前
26秒前
Young完成签到,获得积分10
27秒前
27秒前
28秒前
jessie发布了新的文献求助20
29秒前
bkagyin应助科研通管家采纳,获得10
29秒前
高分求助中
Technologies supporting mass customization of apparel: A pilot project 600
武汉作战 石川达三 500
Arthur Ewert: A Life for the Comintern 500
China's Relations With Japan 1945-83: The Role of Liao Chengzhi // Kurt Werner Radtke 500
Two Years in Peking 1965-1966: Book 1: Living and Teaching in Mao's China // Reginald Hunt 500
Fractional flow reserve- and intravascular ultrasound-guided strategies for intermediate coronary stenosis and low lesion complexity in patients with or without diabetes: a post hoc analysis of the randomised FLAVOUR trial 300
Effects of Receptive Music Therapy Combined with Virtual Reality on Prevalent Symptoms in Patients with Advanced Cancer 282
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3811233
求助须知:如何正确求助?哪些是违规求助? 3355613
关于积分的说明 10376950
捐赠科研通 3072462
什么是DOI,文献DOI怎么找? 1687519
邀请新用户注册赠送积分活动 811671
科研通“疑难数据库(出版商)”最低求助积分说明 766741