LLM-augmented entity alignment: an unsupervised and training-free framework

计算机科学嵌入可扩展性稳健性（进化）实体链接人工智能自然语言处理图形一致性（知识库）同种类的任务（项目管理）语言模型情报检索文字嵌入命名实体识别图嵌入语义相似性灵活性（工程）文字蕴涵语义学（计算机科学）机器学习面子（社会学概念）工作流程主题模型连贯性（哲学赌博策略）语义异质性知识图语义映射信息抽取命名实体绩效改进自然语言

作者

Meixiu Long,Jiahai Wang,Jin Ma,Jianpeng Zhou,Siyuan Chen

出处

期刊：Neural Networks [Elsevier BV]
日期：2025-09-22 卷期号：194: 108139-108139

链接

nih.govdoi.org

标识

DOI：10.1016/j.neunet.2025.108139

摘要

Entity alignment (EA) is a fundamental task in knowledge graph (KG) integration, aiming to identify equivalent entities across different KGs for a unified and comprehensive representation. Recent advances have explored pre-trained language models (PLMs) to enhance the semantic understanding of entities, achieving notable improvements. However, existing methods face two major limitations. First, they rely heavily on human-annotated labels for training, leading to high computational costs and poor scalability. Second, some approaches use large language models (LLMs) to predict alignments in a multi-choice question format, but LLM outputs may deviate from expected formats, and predefined options may exclude correct matches, leading to suboptimal performance. To address these issues, we propose LEA, an LLM-augmented entity alignment framework that eliminates the need for labeled data and enhances robustness by mitigating information heterogeneity at both embedding and semantic levels. LEA first introduces an entity textualization module that transforms structural and textual information into a unified format, ensuring consistency and improving entity representations. It then leverages LLMs to enrich entity descriptions, enhancing semantic distinctiveness. Finally, these enriched descriptions are encoded into a shared embedding space, enabling efficient alignment through text retrieval techniques. To balance performance and computational cost, we further propose a selective augmentation strategy that prioritizes the most ambiguous entities for refinement. Experimental results on both homogeneous and heterogeneous KGs demonstrate that LEA outperforms existing models trained on 30 % labeled data, achieving a 30 % absolute improvement in Hit@1 score. As LLMs and text embedding models advance, LEA is expected to further enhance EA performance, providing a scalable and robust paradigm for practical applications. The code and dataset can be found at https://github.com/Longmeix/LEA.

求助该文献

最长约 10秒，即可获得该文献文件

LLM-augmented entity alignment: an unsupervised and training-free framework

今日热心研友