Your Transformer May Not be as Powerful as You Expect

Softmax函数 变压器 计算机科学 人工神经网络 算法 理论计算机科学 人工智能 电气工程 电压 工程类
作者
Shengjie Luo,Shanda Li,Shuxin Zheng,Tie-Yan Liu,Liwei Wang,Dawei He
出处
期刊:Cornell University - arXiv
标识
DOI:10.48550/arxiv.2205.13401
摘要

Relative Positional Encoding (RPE), which encodes the relative distance between any pair of tokens, is one of the most successful modifications to the original Transformer. As far as we know, theoretical understanding of the RPE-based Transformers is largely unexplored. In this work, we mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximating any continuous sequence-to-sequence functions. One may naturally assume the answer is in the affirmative -- RPE-based Transformers are universal function approximators. However, we present a negative result by showing there exist continuous sequence-to-sequence functions that RPE-based Transformers cannot approximate no matter how deep and wide the neural network is. One key reason lies in that most RPEs are placed in the softmax attention that always generates a right stochastic matrix. This restricts the network from capturing positional information in the RPEs and limits its capacity. To overcome the problem and make the model more powerful, we first present sufficient conditions for RPE-based Transformers to achieve universal function approximation. With the theoretical guidance, we develop a novel attention module, called Universal RPE-based (URPE) Attention, which satisfies the conditions. Therefore, the corresponding URPE-based Transformers become universal function approximators. Extensive experiments covering typical architectures and tasks demonstrate that our model is parameter-efficient and can achieve superior performance to strong baselines in a wide range of applications. The code will be made publicly available at https://github.com/lsj2408/URPE.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
慕青应助zzz采纳,获得10
刚刚
SDD完成签到 ,获得积分10
1秒前
聂聪发布了新的文献求助10
1秒前
ff发布了新的文献求助10
3秒前
淡然素发布了新的文献求助10
4秒前
xh应助Lynth_iota采纳,获得10
5秒前
la关注了科研通微信公众号
5秒前
Jasper应助bing采纳,获得10
5秒前
emmm完成签到 ,获得积分10
5秒前
CipherSage应助朴实的曼荷采纳,获得10
8秒前
9秒前
10秒前
科研通AI6.3应助精明金毛采纳,获得10
11秒前
zhangjialong发布了新的文献求助10
12秒前
每天都困发布了新的文献求助10
13秒前
852应助钰清采纳,获得10
14秒前
一哥哥来薅文献完成签到,获得积分10
16秒前
大模型应助贪玩的秋柔采纳,获得10
16秒前
17秒前
17秒前
17秒前
DY关闭了DY文献求助
17秒前
上官若男应助yu采纳,获得10
19秒前
20秒前
21秒前
21秒前
21秒前
22秒前
Wolfe完成签到,获得积分10
23秒前
归尘发布了新的文献求助10
24秒前
bing发布了新的文献求助10
24秒前
呵呵发布了新的文献求助10
25秒前
ff完成签到,获得积分10
26秒前
Moweikang完成签到,获得积分10
26秒前
会飞的鱼发布了新的文献求助30
27秒前
汉堡包应助晚上吃什么采纳,获得10
28秒前
甜甜乌冬面完成签到,获得积分10
29秒前
31秒前
华仔应助saturn采纳,获得10
31秒前
愉快洋葱完成签到,获得积分10
32秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Les Mantodea de Guyane Insecta, Polyneoptera 2000
Emmy Noether's Wonderful Theorem 1200
Leading Academic-Practice Partnerships in Nursing and Healthcare: A Paradigm for Change 800
基于非线性光纤环形镜的全保偏锁模激光器研究-上海科技大学 800
Signals, Systems, and Signal Processing 610
Wade & Forsyth's Administrative Law 550
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6410276
求助须知:如何正确求助?哪些是违规求助? 8229593
关于积分的说明 17461859
捐赠科研通 5463374
什么是DOI,文献DOI怎么找? 2886728
邀请新用户注册赠送积分活动 1863166
关于科研通互助平台的介绍 1702351