A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

计算机科学炸薯条半导体存储器卷积神经网络人工神经网络推论 CMOS芯片吞吐量嵌入式系统计算机硬件并行计算人工智能电子工程工程类电信无线

作者

Manuel Le Gallo,Riduan Khaddam-Aljameh,Miloš Stanisavljević,Athanasios Vasilopoulos,Benedikt Kersting,Martino Dazzi,Geethan Karunaratne,Matthias Braendli,Abhairaj Singh,Silvia Melitta Mueller,Julian Buechel,Xavier Timoneda,Vinay Joshi,Urs Egger,Angelo Garofalo,Αναστάσιος Πετρόπουλος,Theodore Antonakopoulos,Kevin Brew,Samuel Choi,I. Ok,Timothy M. Philip,Victor Chan,Claire Silvestre,Ishtiaq Ahsan,Nicole Saulnier,Vijaykrishnan Narayanan,Pier Andrea Francese,Evangelos Eleftheriou,Abu Sebastian

出处

期刊：Cornell University - arXiv 日期：2022-12-06 被引量：1

链接

arxiv.org arxiv.org arxiv.org arxiv.orgdoi.org

标识

DOI：10.48550/arxiv.2212.02872

摘要

The need to repeatedly shuttle around synaptic weight values from memory to processing units has been a key source of energy inefficiency associated with hardware implementation of artificial neural networks. Analog in-memory computing (AIMC) with spatially instantiated synaptic weights holds high promise to overcome this challenge, by performing matrix-vector multiplications (MVMs) directly within the network weights stored on a chip to execute an inference workload. However, to achieve end-to-end improvements in latency and energy consumption, AIMC must be combined with on-chip digital operations and communication to move towards configurations in which a full inference workload is realized entirely on-chip. Moreover, it is highly desirable to achieve high MVM and inference accuracy without application-wise re-tuning of the chip. Here, we present a multi-core AIMC chip designed and fabricated in 14-nm complementary metal-oxide-semiconductor (CMOS) technology with backend-integrated phase-change memory (PCM). The fully-integrated chip features 64 256x256 AIMC cores interconnected via an on-chip communication network. It also implements the digital activation functions and processing involved in ResNet convolutional neural networks and long short-term memory (LSTM) networks. We demonstrate near software-equivalent inference accuracy with ResNet and LSTM networks while implementing all the computations associated with the weight layers and the activation functions on-chip. The chip can achieve a maximal throughput of 63.1 TOPS at an energy efficiency of 9.76 TOPS/W for 8-bit input/output matrix-vector multiplications.

求助该文献

最长约 10秒，即可获得该文献文件

A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

今日热心研友