清晨好,您是今天最早来到科研通的研友!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您科研之路漫漫前行!

Prompt Framework for Extracting Scale-Related Knowledge Entities from Chinese Medical Literature: Development and Evaluation Study

计算机科学 命名实体识别 背景(考古学) 任务(项目管理) 比例(比率) 鉴定(生物学) 资源(消歧) 情报检索 可靠性(半导体) 自然语言处理 人工智能 实体链接 数据科学 知识库 生物 物理 古生物学 经济 功率(物理) 管理 量子力学 植物 计算机网络
作者
Jie Hao,Zhenli Chen,Qin Peng,Liang Zhao,Wanqing Zhao,Shan Cong,Junlian Li,Jiao Li,Qing Qian,Haixia Sun
出处
期刊:Journal of Medical Internet Research [JMIR Publications]
卷期号:27: e67033-e67033
标识
DOI:10.2196/67033
摘要

Background Measurement-based care improves patient outcomes by using standardized scales, but its widespread adoption is hindered by the lack of accessible and structured knowledge, particularly in unstructured Chinese medical literature. Extracting scale-related knowledge entities from these texts is challenging due to limited annotated data. While large language models (LLMs) show promise in named entity recognition (NER), specialized prompting strategies are needed to accurately recognize medical scale-related entities, especially in low-resource settings. Objective This study aims to develop and evaluate MedScaleNER, a task-oriented prompt framework designed to optimize LLM performance in recognizing medical scale-related entities from Chinese medical literature. Methods MedScaleNER incorporates demonstration retrieval within in-context learning, chain-of-thought prompting, and self-verification strategies to improve performance. The framework dynamically retrieves optimal examples using a k-nearest neighbors approach and decomposes the NER task into two subtasks: entity type identification and entity labeling. Self-verification ensures the reliability of the final output. A dataset of manually annotated Chinese medical journal papers was constructed, focusing on three key entity types: scale names, measurement concepts, and measurement items. Experiments were conducted by varying the number of examples and the proportion of training data to evaluate performance in low-resource settings. Additionally, MedScaleNER’s performance was compared with locally fine-tuned models. Results The CMedS-NER (Chinese Medical Scale Corpus for Named Entity Recognition) dataset, containing 720 papers with 27,499 manually annotated scale-related knowledge entities, was used for evaluation. Initial experiments identified GLM-4-0520 as the best-performing LLM among six tested models. When applied with GLM-4-0520, MedScaleNER significantly improved NER performance for scale-related entities, achieving a macro F1-score of 59.64% in an exact string match with the full training dataset. The highest performance was achieved with 20-shot demonstrations. Under low-resource scenarios (eg, 1% of the training data), MedScaleNER outperformed all tested locally fine-tuned models. Ablation studies highlighted the importance of demonstration retrieval and self-verification in improving model reliability. Error analysis revealed four main types of mistakes: identification errors, type errors, boundary errors, and missing entities, indicating areas for further improvement. Conclusions MedScaleNER advances the application of LLMs and prompts engineering for specialized NER tasks in Chinese medical literature. By addressing the challenges of unstructured texts and limited annotated data, MedScaleNER’s adaptability to various biomedical contexts supports more efficient and reliable knowledge extraction, contributing to broader measurement-based care implementation and improved clinical and research outcomes.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
华仔应助ldno1采纳,获得10
2秒前
Qi发布了新的文献求助10
2秒前
13秒前
ldno1完成签到,获得积分10
16秒前
ldno1发布了新的文献求助10
19秒前
zhenzhangfynu完成签到,获得积分10
21秒前
leeSongha完成签到 ,获得积分10
23秒前
Qi完成签到,获得积分10
27秒前
段采萱完成签到 ,获得积分10
54秒前
栀蓝完成签到 ,获得积分10
1分钟前
nano完成签到 ,获得积分10
1分钟前
chloe完成签到,获得积分10
1分钟前
LVZHIPENG完成签到,获得积分10
1分钟前
minnie完成签到 ,获得积分10
1分钟前
Jzhaoc580完成签到 ,获得积分10
1分钟前
慧慧34完成签到 ,获得积分10
1分钟前
2分钟前
tfonda完成签到 ,获得积分10
2分钟前
2分钟前
芝士奶盖有点咸完成签到 ,获得积分10
2分钟前
loga80完成签到,获得积分10
2分钟前
老仙翁完成签到,获得积分10
2分钟前
2分钟前
小伟跑位完成签到,获得积分10
2分钟前
如意2023完成签到 ,获得积分10
2分钟前
ghost完成签到 ,获得积分10
2分钟前
gucj完成签到 ,获得积分10
3分钟前
傲娇斑马完成签到 ,获得积分10
3分钟前
YZY完成签到 ,获得积分10
3分钟前
北枳完成签到,获得积分10
4分钟前
十八完成签到 ,获得积分10
4分钟前
4分钟前
冷静丸子完成签到 ,获得积分10
4分钟前
科研通AI2S应助科研通管家采纳,获得10
4分钟前
LZJ完成签到 ,获得积分10
4分钟前
Wenfeifei完成签到,获得积分10
4分钟前
无法无天完成签到 ,获得积分10
4分钟前
wood完成签到,获得积分10
4分钟前
打工给猫买罐头完成签到 ,获得积分10
4分钟前
one完成签到 ,获得积分10
4分钟前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
PowerCascade: A Synthetic Dataset for Cascading Failure Analysis in Power Systems 2000
Picture this! Including first nations fiction picture books in school library collections 1500
Signals, Systems, and Signal Processing 610
Unlocking Chemical Thinking: Reimagining Chemistry Teaching and Learning 555
CLSI M100 Performance Standards for Antimicrobial Susceptibility Testing 36th edition 400
Cancer Targets: Novel Therapies and Emerging Research Directions (Part 1) 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6362236
求助须知:如何正确求助?哪些是违规求助? 8175864
关于积分的说明 17224276
捐赠科研通 5416930
什么是DOI,文献DOI怎么找? 2866611
邀请新用户注册赠送积分活动 1843775
关于科研通互助平台的介绍 1691542