计算机科学
仿形(计算机编程)
地点
数据挖掘
匹配(统计)
信息隐私
代表(政治)
外部数据表示
机器学习
理论计算机科学
人工智能
数学
哲学
统计
语言学
互联网隐私
政治
政治学
法学
操作系统
作者
Wentai Wu,Ligang He,Weiwei Lin,Carsten Maple
标识
DOI:10.1109/tpds.2023.3265588
摘要
Federated Learning (FL) has shown great potential as a privacy-preserving solution to learning from decentralized data that are only accessible to end devices (i.e., clients).The data locality constraint offers strong privacy protection but also makes FL sensitive to the condition of local data.Apart from statistical heterogeneity, a large proportion of the clients, in many scenarios, are probably in possession of low-quality data that are biased, noisy or even irrelevant.As a result, they could significantly slow down the convergence of the global model we aim to build and also compromise its quality.In light of this, we first present a new view of local data by looking into the representation space and observing that they converge in distribution to Normal distributions before activation.We provide theoretical analysis to support our finding.Further, we propose FEDPROF, a novel algorithm for optimizing FL over non-IID data of mixed quality.The key of our approach is a distributional representation profiling and matching scheme that uses the global model to dynamically profile data representations and allows for low-cost, lightweight representation matching.Using the scheme we sample clients adaptively in FL to mitigate the impact of low-quality data on the training process.We evaluated our solution with extensive experiments on different tasks and data conditions under various FL settings.The results demonstrate that the selective behavior of our algorithm leads to a significant reduction in the number of communication rounds and the amount of time (up to 2.4× speedup) for the global model to converge and also provides accuracy gain.
科研通智能强力驱动
Strongly Powered by AbleSci AI