Automated evidence surveillance with AI-enabled pre-ranking with cutoff in living guideline maintenance: a simulation study

指南医学切断医学物理学梅德林医疗急救重症监护医学急诊医学公共卫生流行病学风险评估计算机科学

作者

Darren Rajit,Steve McDonald,Lan Du,Helena Teede,Joanne Enticott

出处

期刊：Journal of Clinical Epidemiology [Elsevier BV]
日期：2026-05-01 卷期号：: 112349-112349

链接

nih.govdoi.org

标识

DOI：10.1016/j.jclinepi.2026.112349

摘要

OBJECTIVES: Living guidelines are an emerging approach to ensure timely synthesis of the research evidence. However, pragmatic methods for maintenance are needed to ensure sustainability. Our study aimed to simulate and evaluate the performance and efficiency of various single database evidence retrieval workflows augmented by AI-enabled pre-ranking with cutoff for living guideline development and maintenance. METHODS: A retrospective simulation study was conducted using data from the 2023 International Polycystic Ovary Syndrome Guidelines. Simulations were run across four databases (Medline, Embase, PubMed and OpenAlex) to identify the peer-reviewed articles included in the guidelines. Workflows were evaluated at the guideline (all articles) and topic level. Single database topic-specific searches were compared against single database overarching searches. The performance of overarching searches with AI-enabled pre-ranking with cutoff at guideline and topic level was also evaluated. Metrics included recall, precision, F score, number of articles needed to screen per relevant study (NNR) and overall screening workload. RESULTS: Across 38 eligible topics (854 articles), overarching searches outperformed topic-specific searches at guideline level for both recall (92% to 96% versus 76% to 89%) and efficiency, reducing overall screening workload by 63% to 70%, and requiring teams to screen 28 to 48 articles per relevant study versus 76 to 160 between comparable databases (Embase, Medline). At individual topic level, topic-specific searches were more efficient than overarching searches integrated with topic-specific rankings. However, topic-specific searches had significantly lower recall (p<0.01) in comparison. AI-enabled ranking provided only marginal efficiency gains at guideline level (3% to 21% NNR reduction) compared to topic level (85% to 95% NNR reduction). Lastly, performance of automated article retrieval via PubMed API was equivalent to manual retrieval via Ovid Medline. CONCLUSION: Single database overarching searches outperform single database topic-specific searches and should be considered during guideline maintenance when most of the guideline needs updating. While topic-specific searches may be more efficient in instances where only a few areas need to be updated, using a single database approach may result in lower recall. single database overarching searches integrated with topic-specific rankings can be considered in such cases.

求助该文献

最长约 10秒，即可获得该文献文件

Automated evidence surveillance with AI-enabled pre-ranking with cutoff in living guideline maintenance: a simulation study

今日热心研友