Human versus artificial intelligence: evaluating ChatGPT’s performance in conducting published systematic reviews with meta-analysis in chronic pain research

荟萃分析 系统回顾 林地 数据提取 联营 人工智能 随机森林 计算机科学 统计 机器学习 灵敏度(控制系统) 金标准(测试) 数据挖掘 医学 梅德林 数学 内科学 工程类 电子工程 政治学 法学
作者
Anam Purewal,Kalli Fautsch,Johana Klasová,Nasir Hussain,Ryan S. D’Souza
出处
期刊:Regional Anesthesia and Pain Medicine [BMJ]
卷期号:: rapm-106358
标识
DOI:10.1136/rapm-2024-106358
摘要

Introduction Artificial intelligence (AI), particularly large-language models like Chat Generative Pre-Trained Transformer (ChatGPT), has demonstrated potential in streamlining research methodologies. Systematic reviews and meta-analyses, often considered the pinnacle of evidence-based medicine, are inherently time-intensive and demand meticulous planning, rigorous data extraction, thorough analysis, and careful synthesis. Despite promising applications of AI, its utility in conducting systematic reviews with meta-analysis remains unclear. This study evaluated ChatGPT’s accuracy in conducting key tasks of a systematic review with meta-analysis. Methods This validation study used data from a published meta-analysis on emotional functioning after spinal cord stimulation. ChatGPT-4o performed title/abstract screening, full-text study selection, and data pooling for this systematic review with meta-analysis. Comparisons were made against human-executed steps, which were considered the gold standard. Outcomes of interest included accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for screening and full-text review tasks. We also assessed for discrepancies in pooled effect estimates and forest plot generation. Results For title and abstract screening, ChatGPT achieved an accuracy of 70.4%, sensitivity of 54.9%, and specificity of 80.1%. In the full-text screening phase, accuracy was 68.4%, sensitivity 75.6%, and specificity 66.8%. ChatGPT successfully pooled data for five forest plots, achieving 100% accuracy in calculating pooled mean differences, 95% CIs, and heterogeneity estimates ( I 2 score and tau-squared values) for most outcomes, with minor discrepancies in tau-squared values (range 0.01–0.05). Forest plots showed no significant discrepancies. Conclusion ChatGPT demonstrates modest to moderate accuracy in screening and study selection tasks, but performs well in data pooling and meta-analytic calculations. These findings underscore the potential of AI to augment systematic review methodologies, while also emphasizing the need for human oversight to ensure accuracy and integrity in research workflows.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
Xummer完成签到,获得积分10
1秒前
1秒前
1秒前
大地完成签到,获得积分10
1秒前
疯猴子果汁完成签到 ,获得积分10
2秒前
2秒前
sweet完成签到,获得积分10
2秒前
Ehgnix发布了新的文献求助10
2秒前
lsq完成签到,获得积分10
2秒前
Letter完成签到 ,获得积分10
3秒前
科研一霸发布了新的文献求助10
4秒前
wanna发布了新的文献求助10
4秒前
1+1发布了新的文献求助10
4秒前
NexusExplorer应助何1采纳,获得10
4秒前
4秒前
4秒前
朴素完成签到,获得积分10
4秒前
haul发布了新的文献求助10
5秒前
5秒前
斯文败类应助梁不二采纳,获得10
5秒前
低空飞行发布了新的文献求助10
5秒前
唐擎汉发布了新的文献求助10
5秒前
5秒前
nulinuli完成签到 ,获得积分10
6秒前
Mine发布了新的文献求助10
6秒前
husy完成签到,获得积分10
6秒前
liuzhenghe发布了新的文献求助30
7秒前
yangfan发布了新的文献求助10
7秒前
8秒前
8秒前
科研小哥发布了新的文献求助10
9秒前
今后应助wanna采纳,获得10
10秒前
Bella完成签到 ,获得积分10
10秒前
10秒前
充电宝应助影流采纳,获得10
11秒前
啦啦啦完成签到,获得积分20
11秒前
Owen应助pengyuyan采纳,获得10
11秒前
12秒前
九珥发布了新的文献求助10
12秒前
高分求助中
Les Mantodea de Guyane: Insecta, Polyneoptera [The Mantids of French Guiana] 2500
Future Approaches to Electrochemical Sensing of Neurotransmitters 1000
生物降解型栓塞微球市场(按产品类型、应用和最终用户)- 2030 年全球预测 1000
盐环境来源微生物多相分类及嗜盐古菌基因 组适应性与演化研究 500
A First Course in Bayesian Statistical Methods 400
聚丙烯腈纤维的辐射交联及对预氧化的影响 400
American Historical Review - Volume 130, Issue 2, June 2025 (Full Issue) 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3911017
求助须知:如何正确求助?哪些是违规求助? 3456751
关于积分的说明 10891070
捐赠科研通 3182954
什么是DOI,文献DOI怎么找? 1759417
邀请新用户注册赠送积分活动 850956
科研通“疑难数据库(出版商)”最低求助积分说明 792317