亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Automated Pathologic TN Classification Prediction and Rationale Generation From Lung Cancer Surgical Pathology Reports Using a Large Language Model Fine-Tuned With Chain-of-Thought: Algorithm Development and Validation Study

背景(考古学) 计算机科学 人工智能 医学 肺癌 自然语言处理 解析 医学物理学 病理 机器学习 生物 古生物学
作者
Sanghwan Kim,Sowon Jang,Borham Kim,Leonard Sunwoo,Seok Kim,Jin-Haeng Chung,Sejin Nam,Hyeongmin Cho,Donghyoung Lee,Keehyuck Lee,Sooyoung Yoo
出处
期刊:JMIR medical informatics [JMIR Publications]
卷期号:12: e67056-e67056 被引量:3
标识
DOI:10.2196/67056
摘要

Background Traditional rule-based natural language processing approaches in electronic health record systems are effective but are often time-consuming and prone to errors when handling unstructured data. This is primarily due to the substantial manual effort required to parse and extract information from diverse types of documentation. Recent advancements in large language model (LLM) technology have made it possible to automatically interpret medical context and support pathologic staging. However, existing LLMs encounter challenges in rapidly adapting to specialized guideline updates. In this study, we fine-tuned an LLM specifically for lung cancer pathologic staging, enabling it to incorporate the latest guidelines for pathologic TN classification. Objective This study aims to evaluate the performance of fine-tuned generative language models in automatically inferring pathologic TN classifications and extracting their rationale from lung cancer surgical pathology reports. By addressing the inefficiencies and extensive parsing efforts associated with rule-based methods, this approach seeks to enable rapid and accurate reclassification aligned with the latest cancer staging guidelines. Methods We conducted a comparative performance evaluation of 6 open-source LLMs for automated TN classification and rationale generation, using 3216 deidentified lung cancer surgical pathology reports based on the American Joint Committee on Cancer (AJCC) Cancer Staging Manual8th edition, collected from a tertiary hospital. The dataset was preprocessed by segmenting each report according to lesion location and morphological diagnosis. Performance was assessed using exact match ratio (EMR) and semantic match ratio (SMR) as evaluation metrics, which measure classification accuracy and the contextual alignment of the generated rationales, respectively. Results Among the 6 models, the Orca2_13b model achieved the highest performance with an EMR of 0.934 and an SMR of 0.864. The Orca2_7b model also demonstrated strong performance, recording an EMR of 0.914 and an SMR of 0.854. In contrast, the Llama2_7b model achieved an EMR of 0.864 and an SMR of 0.771, while the Llama2_13b model showed an EMR of 0.762 and an SMR of 0.690. The Mistral_7b and Llama3_8b models, on the other hand, showed lower performance, with EMRs of 0.572 and 0.489, and SMRs of 0.377 and 0.456, respectively. Overall, the Orca2 models consistently outperformed the others in both TN stage classification and rationale generation. Conclusions The generative language model approach presented in this study has the potential to enhance and automate TN classification in complex cancer staging, supporting both clinical practice and oncology data curation. With additional fine-tuning based on cancer-specific guidelines, this approach can be effectively adapted to other cancer types.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
橙花完成签到 ,获得积分10
1秒前
11秒前
默笙完成签到 ,获得积分10
22秒前
LEMONS完成签到 ,获得积分10
23秒前
华仔应助任性机器猫采纳,获得10
29秒前
40秒前
44秒前
52秒前
任性机器猫完成签到,获得积分20
1分钟前
1分钟前
haly完成签到 ,获得积分10
1分钟前
1分钟前
ranj完成签到,获得积分10
2分钟前
2分钟前
norberta发布了新的文献求助10
2分钟前
2分钟前
2分钟前
有足量NaCl发布了新的文献求助10
2分钟前
NexusExplorer应助有足量NaCl采纳,获得10
2分钟前
Jasper应助有足量NaCl采纳,获得10
2分钟前
科研通AI5应助峡星牙采纳,获得10
2分钟前
2分钟前
2分钟前
向日葵的微笑完成签到,获得积分10
3分钟前
滴滴滴完成签到 ,获得积分10
3分钟前
MMMMM给胡一舟的求助进行了留言
3分钟前
3分钟前
学术小白完成签到,获得积分10
3分钟前
3分钟前
大个应助科研通管家采纳,获得10
3分钟前
李爱国应助Bo采纳,获得10
3分钟前
峡星牙发布了新的文献求助10
3分钟前
笑笑完成签到 ,获得积分10
4分钟前
4分钟前
小子完成签到 ,获得积分10
4分钟前
4分钟前
傲娇香萱完成签到,获得积分10
5分钟前
5分钟前
Bo发布了新的文献求助10
5分钟前
爱静静完成签到,获得积分0
5分钟前
高分求助中
(禁止应助)【重要!!请各位详细阅读】【科研通的精品贴汇总】 10000
International Code of Nomenclature for algae, fungi, and plants (Madrid Code) (Regnum Vegetabile) 1500
Stereoelectronic Effects 1000
Robot-supported joining of reinforcement textiles with one-sided sewing heads 820
The Geometry of the Moiré Effect in One, Two, and Three Dimensions 500
含极性四面体硫代硫酸基团的非线性光学晶体的探索 500
Византийско-аланские отно- шения (VI–XII вв.) 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 4184365
求助须知:如何正确求助?哪些是违规求助? 3720056
关于积分的说明 11723702
捐赠科研通 3398899
什么是DOI,文献DOI怎么找? 1864901
邀请新用户注册赠送积分活动 922482
科研通“疑难数据库(出版商)”最低求助积分说明 834058