发布文献求助

亲爱的研友该休息了！由于当前在线用户较少，发布求助请尽量完整的填写文献信息，科研通机器人24小时在线，伴您度过漫漫科研夜！身体可是革命的本钱，早点休息，好梦！

Fine-tuning large language models for rare disease concept normalization

计算机科学规范化（社会学）自然语言处理微调判决标识符人工智能集合（抽象数据类型）语言模型程序设计语言社会学人类学物理量子力学

作者

Andy Wang,Cong Liu,Jingye Yang,Chunhua Weng

出处

期刊：Journal of the American Medical Informatics Association [Oxford University Press]
日期：2024-06-03 卷期号：31 (9): 2076-2083 被引量：6

链接

标识

DOI：10.1093/jamia/ocae133

摘要

Abstract Objective We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO). Methods We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept’s synonyms as well as identifiers. Subsequently, we fine-tuned Llama 2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms. Results When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only ∼20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy. Conclusion Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen’s terms. Our approach provides a solution for the use of LLMs to identify named medical entities from clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary.

求助该文献

最长约 10秒，即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

活动

『应助活动周』获奖名单已公布 🔥 (2025-4-2)

更新

『中科院2025期刊分区』已更新 (2025-3-23)

更新

『即时热点』模块已上线 (2025-2-28)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: Akim上传了应助文件

20秒前; 任性的一斩发布了新的文献求助10

25秒前; 汤万天完成签到，获得积分10

36秒前; Hans完成签到，获得积分10

53秒前; maozl完成签到，获得积分10

1分钟前; Cecilia完成签到，获得积分20

1分钟前; 科研通AI5的应助被隐形的绮山采纳，获得10

1分钟前; 科研通AI5上传了应助文件

2分钟前; 不羁完成签到，获得积分10

2分钟前; 隐形的绮山发布了新的文献求助10

2分钟前; 怕黑鲂完成签到，获得积分10

2分钟前; 称心的雁兰发布了新的文献求助10

2分钟前; 桐桐上传了应助文件

2分钟前; 任性的一斩发布了新的文献求助10

2分钟前; 李爱国上传了应助文件

3分钟前; Owen上传了应助文件

3分钟前; 小真白发布了新的文献求助10

3分钟前; 李健的粉丝团团长上传了应助文件

4分钟前; kukudou2发布了新的文献求助10

4分钟前; 阿菜完成签到，获得积分10

4分钟前; 慕青上传了应助文件

4分钟前; 不要命的皮卡丘发布了新的文献求助10

5分钟前; 个性归尘上传了应助文件

5分钟前; 完美世界的应助被不要命的皮卡丘采纳，获得10

5分钟前; Yuson_L的应助被zhj采纳，获得10

5分钟前; 烟花的应助被lixiaoxia采纳，获得10

5分钟前; 烟花上传了应助文件

5分钟前; lixiaoxia发布了新的文献求助10

5分钟前; 李健的粉丝团团长上传了应助文件

6分钟前; LiS发布了新的文献求助10

6分钟前; 忧郁的蟑螂王完成签到，获得积分10

6分钟前; 闪闪映易完成签到，获得积分10

6分钟前; Owen上传了应助文件

7分钟前; 科研通AI5的应助被caicainuegou采纳，获得10

7分钟前; 科研通AI5的应助被隐形的绮山采纳，获得10

7分钟前; 任性的一斩发布了新的文献求助10

7分钟前; 和谐的抽屉完成签到，获得积分10

7分钟前; 科研通AI5上传了应助文件

7分钟前; 科研通AI5上传了应助文件

7分钟前; 隐形的绮山发布了新的文献求助10

7分钟前

高分求助中: Mass producing individuality 600; Algorithmic Mathematics in Machine Learning 500; Разработка метода ускоренного контроля качества электрохромных устройств 500; A Combined Chronic Toxicity and Carcinogenicity Study of ε-Polylysine in the Rat 400; Advances in Underwater Acoustics, Structural Acoustics, and Computational Methodologies 300; NK Cell Receptors: Advances in Cell Biology and Immunology by Colton Williams (Editor) 200; Effect of clapping movement with groove rhythm on executive function: focusing on audiomotor entrainment 200

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 3827228; 求助须知：如何正确求助？哪些是违规求助？ 3369590; 关于积分的说明 10456499; 捐赠科研通 3089256; 什么是DOI，文献DOI怎么找？ 1699745; 邀请新用户注册赠送积分活动 817497; 科研通“疑难数据库（出版商）”最低求助积分说明 770251

今日热心研友

遇上就这样吧

jenningseastera

请叫我风吹麦浪

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2025 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：941272744【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通