Towards Understanding Convergence and Generalization of AdamW

计算机科学 一般化 人工智能 趋同(经济学) 模式识别(心理学) 数学 数学分析 经济 经济增长
作者
Pan Zhou,Xingyu Xie,Zhouchen Lin,Shuicheng Yan
出处
期刊:IEEE Transactions on Pattern Analysis and Machine Intelligence [IEEE Computer Society]
卷期号:46 (9): 6486-6493 被引量:20
标识
DOI:10.1109/tpami.2024.3382294
摘要

AdamW modifies Adam by adding a decoupled weight decay to decay network weights per training iteration. For adaptive algorithms, this decoupled weight decay does not affect specific optimization steps, and differs from the widely used $\ell _{2}$ -regularizer which changes optimization steps via changing the first- and second-order gradient moments. Despite its great practical success, for AdamW, its convergence behavior and generalization improvement over Adam and $\ell _{2}$ -regularized Adam ( $\ell _{2}$ -Adam) remain absent yet. To solve this issue, we prove the convergence of AdamW and justify its generalization advantages over Adam and $\ell _{2}$ -Adam. Specifically, AdamW provably converges but minimizes a dynamically regularized loss that combines vanilla loss and a dynamical regularization induced by decoupled weight decay, thus yielding different behaviors with Adam and $\ell _{2}$ -Adam. Moreover, on both general nonconvex problems and PŁ-conditioned problems, we establish stochastic gradient complexity of AdamW to find a stationary point. Such complexity is also applicable to Adam and $\ell _{2}$ -Adam, and improves their previously known complexity, especially for over-parametrized networks. Besides, we prove that AdamW enjoys smaller generalization errors than Adam and $\ell _{2}$ -Adam from the Bayesian posterior aspect. This result, for the first time, explicitly reveals the benefits of decoupled weight decay in AdamW. Experimental results validate our theory.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
gxzsdf完成签到 ,获得积分10
2秒前
冬雪完成签到 ,获得积分10
4秒前
苹果新儿完成签到 ,获得积分10
4秒前
云雨完成签到 ,获得积分10
4秒前
尊敬的驳完成签到,获得积分10
6秒前
cc完成签到,获得积分10
7秒前
年轻的凝云完成签到 ,获得积分10
9秒前
bkagyin应助可口可乐采纳,获得10
11秒前
鲤鱼青雪完成签到,获得积分10
15秒前
ptjam完成签到,获得积分10
16秒前
醉翁完成签到,获得积分10
17秒前
个性的大地完成签到,获得积分10
19秒前
19秒前
20秒前
22秒前
神内小天使完成签到,获得积分10
23秒前
冰河蓝狮发布了新的文献求助10
23秒前
发疯恐龙完成签到,获得积分10
27秒前
可口可乐发布了新的文献求助10
27秒前
Jack完成签到,获得积分10
28秒前
31秒前
31秒前
32秒前
NexusExplorer应助科研通管家采纳,获得50
32秒前
32秒前
田様应助科研通管家采纳,获得10
32秒前
CipherSage应助科研通管家采纳,获得10
32秒前
所所应助科研通管家采纳,获得10
32秒前
爆米花应助科研通管家采纳,获得10
32秒前
科研通AI2S应助科研通管家采纳,获得10
32秒前
32秒前
mm完成签到 ,获得积分10
33秒前
圆彰七大完成签到 ,获得积分10
33秒前
Emper发布了新的文献求助10
34秒前
35秒前
Sky完成签到,获得积分10
36秒前
冷静百川发布了新的文献求助10
37秒前
无花果应助可口可乐采纳,获得10
40秒前
Emper发布了新的文献求助10
40秒前
努力学习ing完成签到 ,获得积分10
41秒前
高分求助中
Introduction to Strong Mixing Conditions Volumes 1-3 500
Tip60 complex regulates eggshell formation and oviposition in the white-backed planthopper, providing effective targets for pest control 400
Optical and electric properties of monocrystalline synthetic diamond irradiated by neutrons 320
共融服務學習指南 300
Essentials of Pharmacoeconomics: Health Economics and Outcomes Research 3rd Edition. by Karen Rascati 300
Peking Blues // Liao San 300
Political Ideologies Their Origins and Impact 13 edition 240
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3801092
求助须知:如何正确求助?哪些是违规求助? 3346581
关于积分的说明 10329880
捐赠科研通 3063102
什么是DOI,文献DOI怎么找? 1681341
邀请新用户注册赠送积分活动 807491
科研通“疑难数据库(出版商)”最低求助积分说明 763726