计算机科学
利用
对偶(语法数字)
机器学习
推论
人工智能
差别隐私
概率逻辑
原始数据
过程(计算)
特征学习
特征(语言学)
分布式学习
数据挖掘
计算机安全
哲学
艺术
文学类
心理学
程序设计语言
操作系统
语言学
教育学
作者
Yuan Gao,Maoguo Gong,Yu Xie,A. K. Qin,Ke Pan,Yew-Soon Ong
标识
DOI:10.1109/tcyb.2021.3139076
摘要
The performance of machine learning algorithms heavily relies on the availability of a large amount of training data. However, in reality, data usually reside in distributed parties such as different institutions and may not be directly gathered and integrated due to various data policy constraints. As a result, some parties may suffer from insufficient data available for training machine learning models. In this article, we propose a multiparty dual learning (MPDL) framework to alleviate the problem of limited data with poor quality in an isolated party. Since the knowledge-sharing processes for multiple parties always emerge in dual forms, we show that dual learning is naturally suitable to handle the challenge of missing data, and explicitly exploits the probabilistic correlation and structural relationship between dual tasks to regularize the training process. We introduce a feature-oriented differential privacy with mathematical proof, in order to avoid possible privacy leakage of raw features in the dual inference process. The approach requires minimal modifications to the existing multiparty learning structure, and each party can build flexible and powerful models separately, whose accuracy is no less than nondistributed self-learning approaches. The MPDL framework achieves significant improvement compared with state-of-the-art multiparty learning methods, as we demonstrated through simulations on real-world datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI