亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Large and Small models for collaborative cross-lingual data augmentation in entity relationship extraction for low-resource languages

计算机科学 萃取(化学) 资源(消歧) 自然语言处理 化学 色谱法 计算机网络
作者
Yurong Wang,Min Lin,Q. P. Hu,Lirong Bao,Shuangcheng Bai,Yanling Li
出处
期刊:Journal of King Saud University - Computer and Information Sciences [Elsevier]
卷期号:37 (4) 被引量:1
标识
DOI:10.1007/s44443-025-00055-w
摘要

Abstract Entity relationship extraction tasks conducted in low-resource language domains (such as the medical and military domains) have long faced significant challenges, and data augmentation (DA) is considered an effective solution for addressing this issue. Compared with monolingual DA methods, cross-lingual augmentation methods more effectively enhance the diversity of data in low-resource language settings by leveraging and extending data resources derived from other languages. Unlike traditional DA methods, large language model (LLM)-based DA reduces the reliance on manual evaluations and generates more fluent and diverse data. The superior performance of LLMs is attributed to their large-scale corpora and computational resources, but these advantages also limit their applicability in cases involving low-resource languages. To address these issues, this paper proposes LS-CLDARE, an entity relationship extraction framework based on collaborative cross-lingual DA that employs both large and small models. Specifically, the small language model (SLM) is responsible for extracting entity information, while the LLM generates cross-lingual samples by combining this entity information and its high-resource language description via chain-of-thought (CoT) prompts, which guide the model through step-by-step reasoning to better handle complex tasks. To achieve enhanced cross-lingual transfer performance, the generated cross-lingual samples are combined with cross-lingual soft prompts and input into an SLM pretrained on a high-resource language domain dataset. Through transfer learning and data expansion, the entity recognition and relation extraction capabilities for the low-resource languages of the SLM are continuously improved. Extensive experiments conducted on ultralow-resource languages in the Mongolian medical domain and classical Chinese texts validate the effectiveness of LS-CLDARE. Compared with those of other DA methods, the F1 score was improved by 3.52% to 9.42%, and compared with those of other prompt-based relation extraction methods, the F1 score was improved by 1.8% to 16.93%.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Fuaget完成签到,获得积分10
2秒前
可爱的函函应助xlj采纳,获得10
3秒前
12秒前
xlj发布了新的文献求助10
16秒前
xlj完成签到,获得积分10
24秒前
56秒前
浮游应助科研通管家采纳,获得10
1分钟前
浮游应助科研通管家采纳,获得10
1分钟前
浮游应助科研通管家采纳,获得10
1分钟前
浮游应助科研通管家采纳,获得10
1分钟前
1分钟前
1分钟前
懵懂的小夏完成签到 ,获得积分10
2分钟前
大胆的碧菡完成签到,获得积分10
2分钟前
2分钟前
莉莉斯完成签到 ,获得积分10
2分钟前
吃鲨鱼的小虾米完成签到 ,获得积分10
2分钟前
2分钟前
jyy应助科研通管家采纳,获得10
3分钟前
浮游应助科研通管家采纳,获得10
3分钟前
浮游应助科研通管家采纳,获得10
3分钟前
3分钟前
汪汪淬冰冰完成签到,获得积分10
3分钟前
Yuki完成签到 ,获得积分10
3分钟前
BowieHuang应助gtgyh采纳,获得10
3分钟前
3分钟前
Wei发布了新的文献求助10
4分钟前
量子星尘发布了新的文献求助10
4分钟前
4分钟前
浮游应助科研通管家采纳,获得10
4分钟前
浮游应助科研通管家采纳,获得10
4分钟前
浮游应助科研通管家采纳,获得10
4分钟前
浮游应助科研通管家采纳,获得10
4分钟前
浮游应助科研通管家采纳,获得10
5分钟前
5分钟前
5分钟前
5分钟前
段嘉琛发布了新的文献求助10
5分钟前
积极的觅松完成签到 ,获得积分10
5分钟前
研友_ngKyqn完成签到,获得积分10
6分钟前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
List of 1,091 Public Pension Profiles by Region 1561
Current Trends in Drug Discovery, Development and Delivery (CTD4-2022) 800
Foregrounding Marking Shift in Sundanese Written Narrative Segments 600
Holistic Discourse Analysis 600
Beyond the sentence: discourse and sentential form / edited by Jessica R. Wirth 600
Science of Synthesis: Houben–Weyl Methods of Molecular Transformations 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 遗传学 催化作用 冶金 量子力学 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 5522862
求助须知:如何正确求助?哪些是违规求助? 4613683
关于积分的说明 14539186
捐赠科研通 4551481
什么是DOI,文献DOI怎么找? 2494253
邀请新用户注册赠送积分活动 1475173
关于科研通互助平台的介绍 1446639