计算机科学
人工神经网络
修剪
卷积神经网络
现场可编程门阵列
深度学习
细胞神经网络
计算机硬件
人工智能
农学
生物
作者
Jinook Song,Yunkyo Cho,Jun‐Seok Park,Junwoo Jang,Sehwan Lee,Joonho Song,Jae-Gon Lee,Inyup Kang
标识
DOI:10.1109/isscc.2019.8662476
摘要
Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performing the convolutional layers (CLs) and fully-connected layers (FCLs) [2–5] in deep neural networks. Moreover, considering that neural networks are getting deeper, the NPU needs to integrate 1K or even more multiply/accumulate (MAC) units. For energy efficiency, compression of neural networks has been studied by pruning neural connections and quantizing weights and features with 8b or even lower fixed-point precision without accuracy loss [1]. A hardware accelerator exploited network sparsity for high utilization of MAC units [3]. However, since it is challenging to predict where pruning is possible, the accelerator needed complex circuitry for selecting an array of features corresponding to an array of non-zero weights. For reducing the power of MAC operations, bit-serial multipliers have been applied [5]. Generally, extremely low- or variable-bit-precision neural networks need to be carefully trained.
科研通智能强力驱动
Strongly Powered by AbleSci AI