计算机科学
萃取(化学)
资源(消歧)
自然语言处理
化学
色谱法
计算机网络
作者
Yurong Wang,Min Lin,Q. P. Hu,Lirong Bao,Shuangcheng Bai,Yanling Li
标识
DOI:10.1007/s44443-025-00055-w
摘要
Abstract Entity relationship extraction tasks conducted in low-resource language domains (such as the medical and military domains) have long faced significant challenges, and data augmentation (DA) is considered an effective solution for addressing this issue. Compared with monolingual DA methods, cross-lingual augmentation methods more effectively enhance the diversity of data in low-resource language settings by leveraging and extending data resources derived from other languages. Unlike traditional DA methods, large language model (LLM)-based DA reduces the reliance on manual evaluations and generates more fluent and diverse data. The superior performance of LLMs is attributed to their large-scale corpora and computational resources, but these advantages also limit their applicability in cases involving low-resource languages. To address these issues, this paper proposes LS-CLDARE, an entity relationship extraction framework based on collaborative cross-lingual DA that employs both large and small models. Specifically, the small language model (SLM) is responsible for extracting entity information, while the LLM generates cross-lingual samples by combining this entity information and its high-resource language description via chain-of-thought (CoT) prompts, which guide the model through step-by-step reasoning to better handle complex tasks. To achieve enhanced cross-lingual transfer performance, the generated cross-lingual samples are combined with cross-lingual soft prompts and input into an SLM pretrained on a high-resource language domain dataset. Through transfer learning and data expansion, the entity recognition and relation extraction capabilities for the low-resource languages of the SLM are continuously improved. Extensive experiments conducted on ultralow-resource languages in the Mongolian medical domain and classical Chinese texts validate the effectiveness of LS-CLDARE. Compared with those of other DA methods, the F1 score was improved by 3.52% to 9.42%, and compared with those of other prompt-based relation extraction methods, the F1 score was improved by 1.8% to 16.93%.
科研通智能强力驱动
Strongly Powered by AbleSci AI