计算机科学
变压器
匹配(统计)
任务(项目管理)
人工智能
背景(考古学)
训练集
集合(抽象数据类型)
相似性(几何)
机器学习
数据挖掘
电压
数学
统计
工程类
古生物学
电气工程
图像(数学)
程序设计语言
系统工程
生物
作者
Ralph Peeters,Christian Bizer
出处
期刊:Cornell University - arXiv
日期:2023-05-05
被引量:3
标识
DOI:10.48550/arxiv.2305.03423
摘要
Entity Matching is the task of deciding if two entity descriptions refer to the same real-world entity. State-of-the-art entity matching methods often rely on fine-tuning Transformer models such as BERT or RoBERTa. Two major drawbacks of using these models for entity matching are that (i) the models require significant amounts of fine-tuning data for reaching a good performance and (ii) the fine-tuned models are not robust concerning out-of-distribution entities. In this paper, we investigate using ChatGPT for entity matching as a more robust, training data-efficient alternative to traditional Transformer models. We perform experiments along three dimensions: (i) general prompt design, (ii) in-context learning, and (iii) provision of higher-level matching knowledge. We show that ChatGPT is competitive with a fine-tuned RoBERTa model, reaching a zero-shot performance of 82.35% F1 on a challenging matching task on which RoBERTa requires 2000 training examples for reaching a similar performance. Adding in-context demonstrations to the prompts further improves the F1 by up to 7.85% when using similarity-based example selection. Always using the same set of 10 handpicked demonstrations leads to an improvement of 4.92% over the zero-shot performance. Finally, we show that ChatGPT can also be guided by adding higher-level matching knowledge in the form of rules to the prompts. Providing matching rules leads to similar performance gains as providing in-context demonstrations.
科研通智能强力驱动
Strongly Powered by AbleSci AI