计算机科学
收敛速度
水准点(测量)
趋同(经济学)
自适应控制
数学优化
人工智能
数学证明
数学
控制(管理)
钥匙(锁)
大地测量学
经济增长
经济
地理
几何学
计算机安全
作者
Kushal Chakrabarti,Nikhil Chopra
出处
期刊:Automatica
[Elsevier]
日期:2024-02-01
卷期号:160: 111466-111466
标识
DOI:10.1016/j.automatica.2023.111466
摘要
Gradient-based optimization and control frameworks have been utilized in several applications. The learning rate parameter is typically chosen following a schedule or using methods such as line search to enhance the convergence rate. Recently, the machine learning community has developed methodologies for automated tuning of the learning rate, known as adaptive gradient methods. This paper develops a control theory-inspired framework for modeling adaptive gradient methods that solve non-convex optimization problems. We first model the adaptive gradient methods in a state–space framework, which allows us to present simpler convergence proofs of prominent adaptive optimizers, such as AdaGrad, Adam, and AdaBelief. The proposed framework is constructive because it allows synthesizing new adaptive optimizers. To illustrate this fact, we then utilize the transfer function paradigm from classical control to propose a new variant of Adam, coined AdamSSM, and prove its convergence. We add an appropriate pole-zero pair in the transfer function from squared gradients to the second moment estimate. Applications on benchmark machine learning tasks of image classification using CNN architectures and language modeling using LSTM architecture demonstrate that the AdamSSM algorithm improves the gap between generalization accuracy and faster convergence than the recent adaptive gradient methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI