On the Emergence of Induction Heads for In-Context Learning

作者
Musat, Tiberiu,Pimentel, Tiago,Noci, Lorenzo,Stolfo, Alessandro,Sachan, Mrinmaya,Hofmann, Thomas
出处
期刊:Cornell University - arXiv
标识
DOI:10.48550/arxiv.2511.01033
摘要

Transformers have become the dominant architecture for natural language processing. Part of their success is owed to a remarkable capability known as in-context learning (ICL): they can acquire and apply novel associations solely from their input context, without any updates to their weights. In this work, we study the emergence of induction heads, a previously identified mechanism in two-layer transformers that is particularly important for in-context learning. We uncover a relatively simple and interpretable structure of the weight matrices implementing the induction head. We theoretically explain the origin of this structure using a minimal ICL task formulation and a modified transformer architecture. We give a formal proof that the training dynamics remain constrained to a 19-dimensional subspace of the parameter space. Empirically, we validate this constraint while observing that only 3 dimensions account for the emergence of an induction head. By further studying the training dynamics inside this 3-dimensional subspace, we find that the time until the emergence of an induction head follows a tight asymptotic bound that is quadratic in the input context length.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
WGK发布了新的文献求助10
1秒前
1秒前
2秒前
3秒前
领导范儿应助舒服的曼云采纳,获得10
4秒前
打打应助occupy采纳,获得10
4秒前
搞怪的亦完成签到,获得积分10
4秒前
今后应助奋斗的向雪采纳,获得10
4秒前
大个应助稀饭采纳,获得10
4秒前
4秒前
YeBL发布了新的文献求助10
5秒前
sian完成签到,获得积分10
5秒前
6秒前
6秒前
Chacha发布了新的文献求助10
7秒前
城九寒发布了新的文献求助10
8秒前
Ava应助科研通管家采纳,获得10
9秒前
诚心鱼发布了新的文献求助10
9秒前
贪玩语蓉完成签到,获得积分10
9秒前
在水一方应助科研通管家采纳,获得10
9秒前
9秒前
fxs发布了新的文献求助10
9秒前
小蘑菇应助科研通管家采纳,获得10
10秒前
科研圣体完成签到,获得积分10
10秒前
研友_VZG7GZ应助科研通管家采纳,获得10
10秒前
无花果应助科研通管家采纳,获得10
10秒前
上官若男应助科研通管家采纳,获得10
11秒前
情怀应助科研通管家采纳,获得10
11秒前
小马甲应助科研通管家采纳,获得10
12秒前
12秒前
12秒前
enhe发布了新的文献求助30
12秒前
丘比特应助科研通管家采纳,获得10
12秒前
小二郎应助科研通管家采纳,获得10
13秒前
yuhang完成签到,获得积分10
13秒前
13秒前
ASLYJS应助三也采纳,获得10
13秒前
13秒前
彭于晏应助科研通管家采纳,获得10
13秒前
14秒前
高分求助中
液晶指向矢仿真分析数据集 8888
GL 2 A method for assessing the in-place cleanability of food processing equipment, Fourth Edition, December 2023 3000
Invited Discussant 63O and 64O 1000
Ideology and Meaning-Making under the Putin Regime 750
Advanced Memory Technology 500
Petrology and Plate Tectonics 500
Writing Systems 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 计算机科学 化学工程 生物化学 物理 内科学 复合材料 催化作用 光电子学 物理化学 电极 细胞生物学 基因 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6860970
求助须知:如何正确求助?哪些是违规求助? 8564554
关于积分的说明 18212401
捐赠科研通 6226993
什么是DOI,文献DOI怎么找? 3047537
关于科研通互助平台的介绍 2047630
邀请新用户注册赠送积分活动 2025193