Softmax函数
变压器
查阅表格
计算机科学
推论
规范化(社会学)
计算
人工神经网络
延迟(音频)
算法
计算机工程
人工智能
电压
工程类
电气工程
电信
社会学
程序设计语言
人类学
作者
Joonsang Yu,Junki Park,Seongmin Park,Minsoo Kim,Si-Hwa Lee,Dong Hyun Lee,Jungwook Choi
出处
期刊:Cornell University - arXiv
日期:2021-12-03
被引量:5
标识
DOI:10.48550/arxiv.2112.02191
摘要
Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models. Several prior works simplified these operations with look-up tables or integer computations, but such approximations suffer inferior accuracy or considerable hardware cost with long latency. This paper proposes an accurate and hardware-friendly approximation framework for efficient Transformer inference. Our framework employs a simple neural network as a universal approximator with its structure equivalently transformed into a LUT. The proposed framework called NN-LUT can accurately replace all the non-linear operations in popular BERT models with significant reductions in area, power consumption, and latency.
科研通智能强力驱动
Strongly Powered by AbleSci AI