亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Large Language Model Evaluation in Traditional Chinese Medicine for Stroke: Quantitative Benchmarking Study

作者
Hongyan Long,Yang Deng,Yaoguang Guo,Zhencai Shen,Yuzhu Zhang,Ji Bao,Yang He
出处
期刊:JMIR formative research [JMIR Publications Inc.]
卷期号:9: e81545-e81545
标识
DOI:10.2196/81545
摘要

Background The application of large language models (LLMs) in medicine is rapidly advancing. However, evaluating LLM capabilities in specialized domains such as traditional Chinese medicine (TCM), which possesses a unique theoretical system and cognitive framework, remains a sizable challenge. Objective This study aimed to provide an empirical evaluation of different LLM types in the specialized domain of TCM stroke. Methods The Traditional Chinese Medicine-Stroke Evaluation Dataset (TCM-SED), a 203-question benchmark, was systematically constructed. The dataset includes 3 paradigms (short-answer questions, multiple-choice questions, and essay questions) and covers multiple knowledge dimensions, including diagnosis, pattern differentiation and treatment, herbal formulas, acupuncture, interpretation of classical texts, and patient communication. Gold standard answers were established through a multiexpert cross-validation and consensus process. The TCM-SED was subsequently used to comprehensively test 2 representative LLM models: GPT-4o (a leading international general-purpose model) and DeepSeek-R1 (a large model primarily trained on Chinese corpora). Results The test results revealed a differentiation in model capabilities across cognitive levels. In objective sections emphasizing precise knowledge recall, DeepSeek-R1 comprehensively outperformed GPT-4o, achieving an accuracy lead of more than 17% in the multiple-choice section (96/137, 70.1% vs 72/137, 52.6%, respectively). Conversely, in the essay section, which tested knowledge integration and complex reasoning, GPT-4o’s performance notably surpassed that of DeepSeek-R1. For instance, in the interpretation of classical texts category, GPT-4o achieved a scoring rate of 90.5% (181/200), far exceeding DeepSeek-R1 (147/200, 73.5%). Conclusions This empirical study demonstrates that Chinese-centric models have a substantial advantage in static knowledge tasks within the TCM domain, whereas leading general-purpose models exhibit stronger dynamic reasoning and content generation capabilities. The TCM-SED, developed as the benchmark for this study, serves as an effective quantitative tool for evaluating and selecting appropriate LLMs for TCM scenarios. It also offers a valuable data foundation and a new research direction for future model optimization and alignment.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
建议保存本图,每天支付宝扫一扫(相册选取)领红包
实时播报
脑洞疼应助科研通管家采纳,获得10
7秒前
7秒前
浮游应助科研通管家采纳,获得10
7秒前
桐桐应助科研通管家采纳,获得10
7秒前
Able完成签到,获得积分10
18秒前
阿里完成签到,获得积分10
18秒前
43秒前
44秒前
53秒前
58秒前
1分钟前
1分钟前
阿里发布了新的文献求助20
1分钟前
蓝色的纪念完成签到,获得积分10
2分钟前
2分钟前
飞天大南瓜完成签到,获得积分10
2分钟前
科研通AI2S应助阿里采纳,获得10
2分钟前
浮游应助科研通管家采纳,获得10
2分钟前
浮游应助科研通管家采纳,获得10
2分钟前
浮游应助科研通管家采纳,获得10
2分钟前
斯文败类应助科研通管家采纳,获得10
2分钟前
浮游应助科研通管家采纳,获得10
2分钟前
李爱国应助科研通管家采纳,获得10
2分钟前
天天快乐应助科研通管家采纳,获得10
2分钟前
义气幼珊完成签到 ,获得积分10
2分钟前
2分钟前
2分钟前
兮豫完成签到 ,获得积分10
2分钟前
斯文败类应助玉金开采纳,获得10
3分钟前
科研通AI6应助留着待会儿采纳,获得10
3分钟前
LIUAiwei完成签到,获得积分10
3分钟前
科研通AI6应助LIUAiwei采纳,获得10
3分钟前
小白果果发布了新的文献求助30
3分钟前
浮游应助科研通管家采纳,获得10
4分钟前
浮游应助科研通管家采纳,获得10
4分钟前
YifanWang应助科研通管家采纳,获得10
4分钟前
YifanWang应助科研通管家采纳,获得10
4分钟前
YifanWang应助科研通管家采纳,获得10
4分钟前
YifanWang应助科研通管家采纳,获得10
4分钟前
YifanWang应助科研通管家采纳,获得10
4分钟前
高分求助中
Learning and Memory: A Comprehensive Reference 2000
Predation in the Hymenoptera: An Evolutionary Perspective 1800
List of 1,091 Public Pension Profiles by Region 1541
The Jasper Project 800
Holistic Discourse Analysis 600
Beyond the sentence: discourse and sentential form / edited by Jessica R. Wirth 600
Binary Alloy Phase Diagrams, 2nd Edition 600
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 遗传学 催化作用 冶金 量子力学 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 5502909
求助须知:如何正确求助?哪些是违规求助? 4598615
关于积分的说明 14464661
捐赠科研通 4532215
什么是DOI,文献DOI怎么找? 2483868
邀请新用户注册赠送积分活动 1467072
关于科研通互助平台的介绍 1439760