计算机科学
趋同(经济学)
随机梯度下降算法
计算
简单(哲学)
符号
人工智能
理论计算机科学
机器学习
算法
人工神经网络
数学
算术
哲学
经济
认识论
经济增长
作者
Kun Yang,Shengbo Chen,Cong Shen
标识
DOI:10.1109/jsac.2022.3229443
摘要
Modern distributed machine learning (ML) paradigms, such as federated learning (FL), utilize data distributed at different clients to train a global model. In such paradigm, local datasets never leave the clients for better privacy protection, and the parameter server (PS) only performs simple aggregation. In practice, however, there is often some amount of data available at the PS, and its computation capability is strong enough to carry out more demanding tasks than simple model aggregation. The focus of this paper is to analyze the model convergence of a new hybrid learning architecture, which leverages the PS dataset and its computation power for collaborative model training with clients. Different from FL where stochastic gradient descent (SGD) is always computed in parallel across clients, the new architecture has both parallel SGD at clients and sequential SGD at PS. We analyze the convergence rate upper bounds of this aggregate-then-advance design for both strongly convex and non-convex loss functions. We show that when the local SGD has an $\mathcal {O}(1/t)$ stepsize, the server SGD needs to scale its stepsize to no slower than $\mathcal {O}(1/t^{2})$ in order to strictly outperform local SGD with strongly convex loss functions. The theoretical findings are corroborated by numerical experiments, where advantages in terms of both accuracy and convergence speed over clients-only (local SGD and FED AVG) and server-only training are demonstrated.
科研通智能强力驱动
Strongly Powered by AbleSci AI