Toward Graph Data Collaboration in a Data-Sharing-Free Manner: A Novel Privacy-Preserving Graph Pretraining Model
计算机科学
图形
理论计算机科学
作者
Jiarong Xu,Jiaan Wang,Zenan Zhou,Tian Lu
出处
期刊:Informs Journal on Computing日期:2025-06-04
标识
DOI:10.1287/ijoc.2023.0115
摘要
Graph data, prevalent in various domains such as telecommunication, supply chain, and social networks, holds significant potential for business, operations, and social administration. Collaborating on graph data across institutions or users can further unleash its value, making it a highly sought-after practice. However, such collaboration poses risks to information privacy and commercial confidentiality. In response, we introduce an innovative new model-sharing strategy for graph data collaboration. Here, a data owner pretrains a graph neural network (GNN) model on their private graph data and then provides model users with query access to this model. The pretrained GNN acts as an intermediary, encapsulating knowledge from the private data without exposing it directly. Two fundamental principles are essential for such a pretrained GNN model: model generalizability and privacy preservation. However, current efforts often fail to achieve both concurrently. To tackle this challenge and promote an open yet secure graph data collaboration framework, we propose a novel privacy-preserving operator. This operator integrates smoothly with graph data augmentation and graph contrastive learning, allowing the pretraining of a GNN that effectively eliminates private links at high risk of exposure while maintaining generalizability. Additionally, to improve model generalizability, we introduce a new method called generalizability learning to enhance the model’s adaptability when deployed on unseen data of model user. This approach is designed to simulate diverse environments and develop representations that remain invariant across these varied environments. Extensive experiments suggest that our model surpasses existing state-of-the-art approaches in striking an effective balance between privacy preservation and generalizability. History: Accepted by Ram Ramesh, Area Editor for Data Science and Machine Learning. Funding: This work was supported (to J. Xu) by the National Natural Science Foundation of China [Grants 62206056, 72271059, and 72442011] and the CIPSC-SMP-Zhipu Large Model Cross-Disciplinary Fund. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2023.0115 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2023.0115 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .