ASLog: An Area-Efficient CNN Accelerator for Per-Channel Logarithmic Post-Training Quantization

量化(信号处理) 计算机科学 算术 算法 计算机硬件 数学 计算机工程
作者
Jiawei Xu,Jiangshan Fan,Baolin Nan,Chen Ding,Li‐Rong Zheng,Zhuo Zou,Yuxiang Huan
出处
期刊:IEEE Transactions on Circuits and Systems I-regular Papers [Institute of Electrical and Electronics Engineers]
卷期号:70 (12): 5380-5393
标识
DOI:10.1109/tcsi.2023.3315299
摘要

Post-training quantization (PTQ) has been proven an efficient model compression technique for Convolution Neural Networks (CNNs), without re-training or access to labeled datasets. However, it remains challenging for a CNN accelerator to fulfill the efficiency potential of PTQ methods. A large number of PTQ techniques blindly pursue high theoretic compression effect and accuracy, ignoring their impact on the actual hardware implementation, which causes more hardware overhead than benefit. This paper introduces ASLog, a PTQ-friendly CNN accelerator that explores four key designs in an algorithm-hardware co-optimizing manner: the first practical 4-bit logarithmic PTQ pipeline SLogII, the multiplier-free arithmetic element (AE) design, the energy-efficient bias correction element (BCE) design, and the per-channel quantization friendly (PCF) architecture and dataflow. The proposed SLogII PTQ pipeline can push the limit of logarithmic PTQ to 4-bit with $<$ 2.5% accuracy degradation on various image classification and face recognition tasks. Exploiting the approximate computing design and a novel encoding and decoding scheme, the proposed SLogII AE is $>$ 40% lower in power and area consumption compared with a common 8-bit multiplier. The BCE and PCF design proposed in this paper are the first to consider the hardware impact of the widely-used per-channel quantization and bias correction technique, enabling an efficient PTQ-friendly implementation with a small hardware overhead. The ASLog is validated in a UMC 40-nm process, with 12.2 TOPS/W energy efficiency and 0.80 mm $^2$ core area. The ASLog can achieve 336.3 GOPS/mm $^2$ area efficiency and $>$ 500 OPs/Byte operational intensity, which map to over 1.85 $\times$ and 1.12 $\times$ improvement compared with the previous related works.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
xiixix发布了新的文献求助10
3秒前
3秒前
淡淡从安发布了新的文献求助10
4秒前
rrjl完成签到,获得积分10
4秒前
4秒前
积极的远山完成签到,获得积分10
4秒前
悲伤火龙果完成签到,获得积分10
5秒前
coc完成签到 ,获得积分10
5秒前
5秒前
8秒前
9秒前
小小威廉发布了新的文献求助10
9秒前
王黎完成签到,获得积分10
9秒前
充电宝应助Bond采纳,获得10
12秒前
暖暖发布了新的文献求助10
13秒前
13秒前
13秒前
13秒前
酷炫的凤妖完成签到 ,获得积分10
14秒前
顺其自然发布了新的文献求助10
15秒前
15秒前
无私一德完成签到,获得积分20
16秒前
Chichien发布了新的文献求助10
17秒前
芸珂发布了新的文献求助10
18秒前
18秒前
LIJIngcan发布了新的文献求助10
19秒前
20秒前
venkash完成签到,获得积分10
22秒前
22秒前
excellent_shit完成签到,获得积分10
22秒前
科目三应助怕黑的凌柏采纳,获得10
23秒前
赵大宝完成签到,获得积分10
25秒前
venkash发布了新的文献求助10
25秒前
26秒前
26秒前
孙洪琼发布了新的文献求助10
27秒前
27秒前
千金小颖公主完成签到,获得积分20
28秒前
于清绝完成签到 ,获得积分10
28秒前
巴巴变完成签到,获得积分10
28秒前
高分求助中
Technologies supporting mass customization of apparel: A pilot project 600
Introduction to Strong Mixing Conditions Volumes 1-3 500
China—Art—Modernity: A Critical Introduction to Chinese Visual Expression from the Beginning of the Twentieth Century to the Present Day 430
Tip60 complex regulates eggshell formation and oviposition in the white-backed planthopper, providing effective targets for pest control 400
A Field Guide to the Amphibians and Reptiles of Madagascar - Frank Glaw and Miguel Vences - 3rd Edition 400
China Gadabouts: New Frontiers of Humanitarian Nursing, 1941–51 400
The Healthy Socialist Life in Maoist China, 1949–1980 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3797784
求助须知:如何正确求助?哪些是违规求助? 3343264
关于积分的说明 10315131
捐赠科研通 3060016
什么是DOI,文献DOI怎么找? 1679212
邀请新用户注册赠送积分活动 806436
科研通“疑难数据库(出版商)”最低求助积分说明 763150