平滑度
随机梯度下降算法
迭代函数
极小极大
简单(哲学)
随机优化
数学
数学优化
凸函数
趋同(经济学)
应用数学
收敛速度
计算机科学
随机逼近
正多边形
梯度下降
人工神经网络
人工智能
数学分析
钥匙(锁)
几何学
哲学
经济增长
认识论
经济
计算机安全
作者
Ohad Shamir,Tong Zhang
出处
期刊:International Conference on Machine Learning
日期:2013-06-16
卷期号:28 (1): 71-79
被引量:415
摘要
Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required non-trivial smoothness assumptions, which do not apply to many modern applications of SGD with non-smooth objective functions such as support vector machines. In this paper, we investigate the performance of SGD without such smoothness assumptions, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy. In this framework, we prove that after T rounds, the suboptimality of the last SGD iterate scales as O(log(T)/√T) for non-smooth convex objective functions, and O(log(T)/T) in the non-smooth strongly convex case. To the best of our knowledge, these are the first bounds of this kind, and almost match the minimax-optimal rates obtainable by appropriate averaging schemes. We also propose a new and simple averaging scheme, which not only attains optimal rates, but can also be easily computed on-the-fly (in contrast, the suffix averaging scheme proposed in Rakhlin et al. (2011) is not as simple to implement). Finally, we provide some experimental illustrations.
科研通智能强力驱动
Strongly Powered by AbleSci AI