计算机科学
联合学习
增采样
分歧(语言学)
移动设备
人工智能
分布式数据库
分布式学习
移动计算
人工神经网络
数据建模
机器学习
数据挖掘
分布式计算
计算机网络
数据库
哲学
图像(数学)
操作系统
语言学
教育学
心理学
作者
Moming Duan,Duo Liu,Xianzhang Chen,Renping Liu,Yujuan Tan,Liang Liang
标识
DOI:10.1109/tpds.2020.3009406
摘要
Federated learning (FL) is a distributed deep learning method that enables multiple participants, such as mobile and IoT devices, to contribute a neural network while their private training data remains in local devices. This distributed approach is promising in the mobile systems where have a large corpus of decentralized data and require high privacy. However, unlike the common datasets, the data distribution of the mobile systems is imbalanced which will increase the bias of model. In this article, we demonstrate that the imbalanced distributed training data will cause an accuracy degradation of FL applications. To counter this problem, we build a self-balancing FL framework named Astraea, which alleviates the imbalances by 1) Z-score-based data augmentation, and 2) Mediator-based multi-client rescheduling. The proposed framework relieves global imbalance by adaptive data augmentation and downsampling, and for averaging the local imbalance, it creates the mediator to reschedule the training of clients based on Kullback-Leibler divergence (KLD) of their data distribution. Compared with FedAvg, the vanilla FL algorithm, Astraea shows +4.39 and +6.51 percent improvement of top-1 accuracy on the imbalanced EMNIST and imbalanced CINIC-10 datasets, respectively. Meanwhile, the communication traffic of Astraea is reduced by 75 percent compared to FedAvg.
科研通智能强力驱动
Strongly Powered by AbleSci AI