计算机科学
特征学习
节点(物理)
聚类分析
变压器
编码(内存)
人工智能
代表(政治)
异构网络
自然语言处理
数据挖掘
无线网络
电信
政治
物理
结构工程
工程类
量子力学
电压
法学
无线
政治学
作者
Bowen Jin,Yu Zhang,Qi Zhu,Jiawei Han
标识
DOI:10.1145/3580305.3599376
摘要
Representation learning on networks aims to derive a meaningful vector representation for each node, thereby facilitating downstream tasks such as link prediction, node classification, and node clustering. In heterogeneous text-rich networks, this task is more challenging due to (1) presence or absence of text: Some nodes are associated with rich textual information, while others are not; (2) diversity of types: Nodes and edges of multiple types form a heterogeneous network structure. As pretrained language models (PLMs) have demonstrated their effectiveness in obtaining widely generalizable text representations, a substantial amount of effort has been made to incorporate PLMs into representation learning on text-rich networks. However, few of them can jointly consider heterogeneous structure (network) information as well as rich textual semantic information of each node effectively. In this paper, we propose Heterformer, a Heterogeneous Network-Empowered Transformer that performs contextualized text encoding and heterogeneous structure encoding in a unified model. Specifically, we inject heterogeneous structure information into each Transformer layer when encoding node texts. Meanwhile, Heterformer is capable of characterizing node/edge type heterogeneity and encoding nodes with or without texts. We conduct comprehensive experiments on three tasks (i.e., link prediction, node classification, and node clustering) on three large-scale datasets from different domains, where Heterformer outperforms competitive baselines significantly and consistently. The code can be found at https://github.com/PeterGriffinJin/Heterformer.
科研通智能强力驱动
Strongly Powered by AbleSci AI