李普希茨连续性
正规化(语言学)
数学
常量(计算机编程)
应用数学
解算器
有界函数
收敛速度
趋同(经济学)
二次方程
数学优化
人工神经网络
计算机科学
人工智能
数学分析
经济增长
频道(广播)
经济
程序设计语言
计算机网络
几何学
作者
Dounia Lakhmiri,Dominique Orban,Andrea Lodi
出处
期刊:Cornell University - arXiv
日期:2022-06-14
标识
DOI:10.48550/arxiv.2206.06531
摘要
We consider the problem of training a deep neural network with nonsmooth regularization to retrieve a sparse and efficient sub-structure. Our regularizer is only assumed to be lower semi-continuous and prox-bounded. We combine an adaptive quadratic regularization approach with proximal stochastic gradient principles to derive a new solver, called SR2, whose convergence and worst-case complexity are established without knowledge or approximation of the gradient's Lipschitz constant. We formulate a stopping criteria that ensures an appropriate first-order stationarity measure converges to zero under certain conditions. We establish a worst-case iteration complexity of $\mathcal{O}(ε^{-2})$ that matches those of related methods like ProxGEN, where the learning rate is assumed to be related to the Lipschitz constant. Our experiments on network instances trained on CIFAR-10 and CIFAR-100 with $\ell_1$ and $\ell_0$ regularizations show that SR2 consistently achieves higher sparsity and accuracy than related methods such as ProxGEN and ProxSGD.
科研通智能强力驱动
Strongly Powered by AbleSci AI