Generative large language models are all-purpose text analytics engines: text-to-text learning is all your need

计算机科学 生成语法 人工智能 变压器 自然语言处理 关系抽取 推论 机器学习 生成模型 语言模型 规范化(社会学) 信息抽取 物理 量子力学 电压 社会学 人类学
作者
Peng Cheng,Xi Yang,Aokun Chen,Zehao Yu,Kaleb E Smith,Anthony Costa,Mona G. Flores,Jiang Bian,Yonghui Wu
出处
期刊:Journal of the American Medical Informatics Association [Oxford University Press]
卷期号:31 (9): 1892-1903 被引量:15
标识
DOI:10.1093/jamia/ocae078
摘要

Abstract Objective To solve major clinical natural language processing (NLP) tasks using a unified text-to-text learning architecture based on a generative large language model (LLM) via prompt tuning. Methods We formulated 7 key clinical NLP tasks as text-to-text learning and solved them using one unified generative clinical LLM, GatorTronGPT, developed using GPT-3 architecture and trained with up to 20 billion parameters. We adopted soft prompts (ie, trainable vectors) with frozen LLM, where the LLM parameters were not updated (ie, frozen) and only the vectors of soft prompts were updated, known as prompt tuning. We added additional soft prompts as a prefix to the input layer, which were optimized during the prompt tuning. We evaluated the proposed method using 7 clinical NLP tasks and compared them with previous task-specific solutions based on Transformer models. Results and Conclusion The proposed approach achieved state-of-the-art performance for 5 out of 7 major clinical NLP tasks using one unified generative LLM. Our approach outperformed previous task-specific transformer models by ∼3% for concept extraction and 7% for relation extraction applied to social determinants of health, 3.4% for clinical concept normalization, 3.4%-10% for clinical abbreviation disambiguation, and 5.5%-9% for natural language inference. Our approach also outperformed a previously developed prompt-based machine reading comprehension (MRC) model, GatorTron-MRC, for clinical concept and relation extraction. The proposed approach can deliver the “one model for all” promise from training to deployment using a unified generative LLM.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
刚刚
刚刚
2秒前
谨慎的易蓉应助yg采纳,获得50
2秒前
4秒前
悠悠完成签到,获得积分10
4秒前
糟糕的鞋垫完成签到 ,获得积分10
4秒前
wzz发布了新的文献求助10
4秒前
5秒前
如意小海豚完成签到 ,获得积分10
5秒前
Luffa完成签到,获得积分10
6秒前
下论文完成签到,获得积分10
6秒前
zhonglv7应助等等采纳,获得10
6秒前
科研通AI6.3应助等等采纳,获得10
6秒前
6秒前
7秒前
8秒前
wanghe完成签到,获得积分20
8秒前
云中应助姜科就叫你采纳,获得20
8秒前
luming完成签到 ,获得积分10
8秒前
8秒前
Theone发布了新的文献求助10
8秒前
无争完成签到,获得积分10
8秒前
Owen应助忐忑的远山采纳,获得20
9秒前
10秒前
Jasper应助精明书雁采纳,获得10
11秒前
pg发布了新的文献求助10
11秒前
11秒前
Ver_Lec完成签到,获得积分10
12秒前
12秒前
丁叮叮发布了新的文献求助10
13秒前
科研通AI6.2应助littleyi采纳,获得10
13秒前
彭于晏应助wzz采纳,获得10
14秒前
14秒前
情怀应助yuyu采纳,获得10
15秒前
张行发布了新的文献求助10
15秒前
wop111应助活泼的寄松采纳,获得30
15秒前
15秒前
15秒前
高分求助中
Modern Epidemiology, Fourth Edition 5000
Kinesiophobia : a new view of chronic pain behavior 5000
Molecular Biology of Cancer: Mechanisms, Targets, and Therapeutics 3000
Propeller Design 2000
Weaponeering, Fourth Edition – Two Volume SET 2000
Handbook of pharmaceutical excipients, Ninth edition 1500
First commercial application of ELCRES™ HTV150A film in Nichicon capacitors for AC-DC inverters: SABIC at PCIM Europe 1000
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 纳米技术 化学工程 生物化学 物理 计算机科学 内科学 复合材料 催化作用 物理化学 光电子学 电极 冶金 细胞生物学 基因
热门帖子
关注 科研通微信公众号,转发送积分 6009424
求助须知:如何正确求助?哪些是违规求助? 7549494
关于积分的说明 16130612
捐赠科研通 5155941
什么是DOI,文献DOI怎么找? 2761759
邀请新用户注册赠送积分活动 1740012
关于科研通互助平台的介绍 1633141