亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Extracting lung cancer staging descriptors from pathology reports: A generative language model approach

计算机科学 肺癌 人工智能 自然语言处理 病理 生成模型 生成语法 医学
作者
Hyeongmin Cho,Sooyoung Yoo,Borham Kim,Sowon Jang,Leonard Sunwoo,Sang‐Hwan Kim,Donghyoung Lee,Seok Kim,Sejin Nam,Jin‐Haeng Chung
出处
期刊:Journal of Biomedical Informatics [Elsevier BV]
卷期号:157: 104720-104720 被引量:1
标识
DOI:10.1016/j.jbi.2024.104720
摘要

In oncology, electronic health records contain textual key information for the diagnosis, staging, and treatment planning of patients with cancer. However, text data processing requires a lot of time and effort, which limits the utilization of these data. Recent advances in natural language processing (NLP) technology, including large language models, can be applied to cancer research. Particularly, extracting the information required for the pathological stage from surgical pathology reports can be utilized to update cancer staging according to the latest cancer staging guidelines. This study has two main objectives. The first objective is to evaluate the performance of extracting information from text-based surgical pathology reports and determining pathological stages based on the extracted information using fine-tuned generative language models (GLMs) for patients with lung cancer. The second objective is to determine the feasibility of utilizing relatively small GLMs for information extraction in a resource-constrained computing environment. Lung cancer surgical pathology reports were collected from the Common Data Model database of Seoul National University Bundang Hospital (SNUBH), a tertiary hospital in Korea. We selected 42 descriptors necessary for tumor-node (TN) classification based on these reports and created a gold standard with validation by two clinical experts. The pathology reports and gold standard were used to generate prompt-response pairs for training and evaluating GLMs which then were used to extract information required for staging from pathology reports. We evaluated the information extraction performance of six trained models as well as their performance in TN classification using the extracted information. The Deductive Mistral-7B model, which was pre-trained with the deductive dataset, showed the best performance overall, with an exact match ratio of 92.24% in the information extraction problem and an accuracy of 0.9876 (predicting T and N classification concurrently) in classification. This study demonstrated that training GLMs with deductive datasets can improve information extraction performance, and GLMs with a relatively small number of parameters at approximately seven billion can achieve high performance in this problem. The proposed GLM-based information extraction method is expected to be useful in clinical decision-making support, lung cancer staging and research.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
BA1完成签到,获得积分10
4秒前
科研雪瑞完成签到,获得积分10
12秒前
努力努力再努力完成签到,获得积分10
15秒前
何止关注了科研通微信公众号
17秒前
科研通AI5应助王舜富采纳,获得10
18秒前
澄碧千顷完成签到 ,获得积分10
21秒前
pupu完成签到 ,获得积分10
25秒前
27秒前
30秒前
过时的白曼完成签到,获得积分10
32秒前
王舜富发布了新的文献求助10
33秒前
Owen应助研友_Zzrx6Z采纳,获得20
33秒前
35秒前
何止发布了新的文献求助10
38秒前
40秒前
41秒前
从容水池完成签到,获得积分20
45秒前
王舜富完成签到,获得积分20
45秒前
隐形曼青应助皮老师采纳,获得10
46秒前
张晓祁完成签到,获得积分10
47秒前
上官若男应助我爱科研采纳,获得10
47秒前
yueying完成签到,获得积分10
54秒前
55秒前
帮主哥哥应助科研通管家采纳,获得30
58秒前
58秒前
搜集达人应助科研通管家采纳,获得50
58秒前
英姑应助科研通管家采纳,获得10
58秒前
58秒前
58秒前
58秒前
zorro3574发布了新的文献求助10
1分钟前
我是老大应助沉默的觅海采纳,获得30
1分钟前
1分钟前
1分钟前
1分钟前
干净以珊发布了新的文献求助10
1分钟前
皮老师发布了新的文献求助10
1分钟前
科研通AI5应助干净以珊采纳,获得10
1分钟前
自由的思枫完成签到 ,获得积分10
1分钟前
1分钟前
高分求助中
Worked Bone, Antler, Ivory, and Keratinous Materials 1000
Algorithmic Mathematics in Machine Learning 500
Разработка метода ускоренного контроля качества электрохромных устройств 500
建筑材料检测与应用 370
Getting Published in SSCI Journals: 200+ Questions and Answers for Absolute Beginners 300
Advances in Underwater Acoustics, Structural Acoustics, and Computational Methodologies 300
The Monocyte-to-HDL ratio (MHR) as a prognostic and diagnostic biomarker in Acute Ischemic Stroke: A systematic review with meta-analysis (P9-14.010) 240
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3830364
求助须知:如何正确求助?哪些是违规求助? 3372779
关于积分的说明 10475199
捐赠科研通 3092539
什么是DOI,文献DOI怎么找? 1702118
邀请新用户注册赠送积分活动 818797
科研通“疑难数据库(出版商)”最低求助积分说明 771087