Self-attention Mechanism at the Token Level: Gradient Analysis and Algorithm Optimization

计算机科学 安全性令牌 变压器 机器翻译 人工智能 机制(生物学) 算法 失真(音乐) 电压 认识论 带宽(计算) 放大器 哲学 物理 量子力学 计算机安全 计算机网络
作者
Linqing Liu,Xiaolong Xu
出处
期刊:Knowledge Based Systems [Elsevier BV]
卷期号:277: 110784-110784 被引量:5
标识
DOI:10.1016/j.knosys.2023.110784
摘要

The self-attention mechanism is a feature processing mechanism for structured data in deep learning models. It has been widely used in transformer-based deep learning models and has demonstrated superior performance in various fields, such as machine translation, speech recognition, text-to-text conversion, and computer vision. The self-attention mechanism mainly focuses on the surface structure of structured data, but it also involves attention between basic data units and self-attention of basic data units in the deeper structure of the data. In this paper, we investigate the forward attention flow and backward gradient flow in the self-attention module of the transformer model based on the sequence-to-sequence data structure used in machine translation tasks. We found that this combination produces a "gradient distortion" phenomenon at the token level of basic data units. We consider this phenomenon a defect and propose a series of solutions to address it theoretically. Furthermore, we conduct experiments and select the most robust solution as the Unevenness-Reduced Self-Attention (URSA) module, which replaces the original self-attention module. The experimental results demonstrate that the "gradient distortion" phenomenon exists both theoretically and numerically, and the URSA module enables the self-attention mechanism to achieve consistent, stable, and effective optimization across different models, tasks, corpora, and evaluation metrics. The URSA module is both simple and highly portable.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
刚刚
1秒前
1秒前
高高冰蝶应助小荔枝采纳,获得10
2秒前
万能图书馆应助阿树采纳,获得10
3秒前
核桃发布了新的文献求助10
4秒前
dccfv关注了科研通微信公众号
4秒前
molo发布了新的文献求助10
4秒前
孟寐以求发布了新的文献求助10
4秒前
5秒前
EVE发布了新的文献求助10
6秒前
SYLH应助小强123采纳,获得10
6秒前
敢敢完成签到,获得积分10
6秒前
追寻半仙完成签到 ,获得积分10
6秒前
科研通AI2S应助不当脆脆鲨采纳,获得10
7秒前
wenwenwang完成签到 ,获得积分10
7秒前
7秒前
kuoyu88发布了新的文献求助10
8秒前
8秒前
wxy完成签到,获得积分10
10秒前
勤奋的姒完成签到 ,获得积分10
11秒前
12秒前
12秒前
mao发布了新的文献求助20
14秒前
于芋菊发布了新的文献求助10
15秒前
15秒前
freefys发布了新的文献求助10
17秒前
小二郎应助molo采纳,获得10
17秒前
科研通AI5应助孟寐以求采纳,获得10
17秒前
晞沫耶完成签到 ,获得积分10
18秒前
丁鹏笑完成签到 ,获得积分0
19秒前
在水一方应助zhh采纳,获得10
19秒前
19秒前
小小朝完成签到,获得积分10
20秒前
科研通AI2S应助xiuxiu采纳,获得10
20秒前
20秒前
21秒前
氢氧化钠Li完成签到,获得积分10
22秒前
660完成签到,获得积分10
23秒前
高分求助中
Les Mantodea de Guyane Insecta, Polyneoptera 2500
Nucleophilic substitution in azasydnone-modified dinitroanisoles 500
Technologies supporting mass customization of apparel: A pilot project 450
A China diary: Peking 400
Brain and Heart The Triumphs and Struggles of a Pediatric Neurosurgeon 400
Cybersecurity Blueprint – Transitioning to Tech 400
Mixing the elements of mass customisation 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3784418
求助须知:如何正确求助?哪些是违规求助? 3329484
关于积分的说明 10242453
捐赠科研通 3044982
什么是DOI,文献DOI怎么找? 1671481
邀请新用户注册赠送积分活动 800346
科研通“疑难数据库(出版商)”最低求助积分说明 759372