计算机科学
炸薯条
半导体存储器
卷积神经网络
人工神经网络
推论
CMOS芯片
吞吐量
嵌入式系统
计算机硬件
并行计算
人工智能
电子工程
工程类
电信
无线
作者
Manuel Le Gallo,Riduan Khaddam-Aljameh,Miloš Stanisavljević,Athanasios Vasilopoulos,Benedikt Kersting,Martino Dazzi,Geethan Karunaratne,Matthias Bräendli,Abhairaj Singh,Silvia Melitta Mueller,Julian Buechel,Xavier Timoneda,Vinay Joshi,Urs Egger,Angelo Garofalo,Αναστάσιος Πετρόπουλος,Theodore Antonakopoulos,Kevin Brew,Choi, Samuel,Injo Ok
出处
期刊:Cornell University - arXiv
日期:2022-12-06
被引量:18
标识
DOI:10.48550/arxiv.2212.02872
摘要
The need to repeatedly shuttle around synaptic weight values from memory to processing units has been a key source of energy inefficiency associated with hardware implementation of artificial neural networks. Analog in-memory computing (AIMC) with spatially instantiated synaptic weights holds high promise to overcome this challenge, by performing matrix-vector multiplications (MVMs) directly within the network weights stored on a chip to execute an inference workload. However, to achieve end-to-end improvements in latency and energy consumption, AIMC must be combined with on-chip digital operations and communication to move towards configurations in which a full inference workload is realized entirely on-chip. Moreover, it is highly desirable to achieve high MVM and inference accuracy without application-wise re-tuning of the chip. Here, we present a multi-core AIMC chip designed and fabricated in 14-nm complementary metal-oxide-semiconductor (CMOS) technology with backend-integrated phase-change memory (PCM). The fully-integrated chip features 64 256x256 AIMC cores interconnected via an on-chip communication network. It also implements the digital activation functions and processing involved in ResNet convolutional neural networks and long short-term memory (LSTM) networks. We demonstrate near software-equivalent inference accuracy with ResNet and LSTM networks while implementing all the computations associated with the weight layers and the activation functions on-chip. The chip can achieve a maximal throughput of 63.1 TOPS at an energy efficiency of 9.76 TOPS/W for 8-bit input/output matrix-vector multiplications.
科研通智能强力驱动
Strongly Powered by AbleSci AI