计算机科学
文字嵌入
短语
身份(音乐)
语义学(计算机科学)
联动装置(软件)
判决
词(群论)
人工智能
情报检索
钥匙(锁)
自然语言处理
语义相似性
用户建模
过程(计算)
嵌入
用户界面
程序设计语言
化学
基因
哲学
物理
操作系统
生物化学
语言学
计算机安全
声学
作者
Hao Gao,Yongqing Wang,Jiangli Shao,Huawei Shen,Xueqi Cheng
标识
DOI:10.1109/bigdata52589.2021.9671907
摘要
User identity linkage aims to link users with the same identities across different social networks. Recently, re- searchers model the similarities of users’ behaviors such as Point of Interests(PoIs) or User Generated Contents(UGCs) to predict the identities of users. However, it is non-trivial to solve the problem due to the following challenges: 1) PoIs are always sparse in the non-location-based social platforms, and it is impractical to measure the similarities of users solely with PoIs; 2) The similarities of hierarchical are hierarchical from the view of word, phrase, and sentence. How to model the hierarchical structure remains a key challenge; 3) The unreliable semantics of words. Two different words may refer to the same physical appearance of users, indicating that users are with the same identities.To tackle the above problems, we propose UGCLink, a knowledge distillation framework that models UGCs to predict user identities. Two main components are included in the framework, where the student network aims to model the similarities of UGCs and the teacher network guides the student network to learn better word embeddings that reveal the physical appearance of users. Besides, the teacher network, a document classification model that classifies UGCs into the categories of PoIs, is trained to guide the word embedding learning process in the student network to circumvent the unreliable semantic problem. We demonstrate that our proposed method outperforms the state- of-the-art methods by more than 11% in terms of AUC score.
科研通智能强力驱动
Strongly Powered by AbleSci AI