Structural insights into clinical large language models and their barriers to translational readiness

概化理论可用性计算机科学桥（图论）数据科学定性研究人工智能人机交互研究设计数据收集心理学桥接（联网）健康信息学自然语言处理管理科学转化研究梅德林知识管理语言理解定性性质临床研究设计过程管理

作者

Jiwon You,Hangsik Shin

出处

期刊：Journal of the American Medical Informatics Association [Oxford University Press]
日期：2025-12-16 卷期号：33 (3): 732-742 被引量：1

链接

nih.govdoi.org

标识

DOI：10.1093/jamia/ocaf230

摘要

BACKGROUND: Despite rapid integration into clinical decision-making, clinical large language models (LLMs) face substantial translational barriers due to insufficient structural characterization and limited external validation. OBJECTIVE: We systematically map the clinical LLM research landscape to identify key structural patterns influencing their readiness for real-world clinical deployment. METHODS: We identified 73 clinical LLM studies published between January 2020 and March 2025 using a structured evidence-mapping approach. To ensure transparency and reproducibility in study selection, we followed key principles from the PRISMA 2020 framework. Each study was categorized by clinical task, base architecture, alignment strategy, data type, language, study design, validation methods, and evaluation metrics. RESULTS: Studies often addressed multiple early stage clinical tasks-question answering (56.2%), knowledge structuring (31.5%), and disease prediction (43.8%)-primarily using text data (52.1%) and English-language resources (80.8%). GPT models favored retrieval-augmented generation (43.8%), and LLaMA models consistently adopted multistage pretraining and fine-tuning strategies. Only 6.9% of studies included external validation, and prospective designs were observed in just 4.1% of cases, reflecting significant gaps in translational reliability. Evaluations were predominantly quantitative only (79.5%), though qualitative and mixed-method approaches are increasingly recognized for assessing clinical usability and trustworthiness. CONCLUSION: Clinical LLM research remains exploratory, marked by limited generalizability across languages, data types, and clinical environments. To bridge this gap, future studies must prioritize multilingual and multimodal training, prospective study designs with rigorous external validation, and hybrid evaluation frameworks combining quantitative performance with qualitative clinical usability metrics.

求助该文献

最长约 10秒，即可获得该文献文件

Structural insights into clinical large language models and their barriers to translational readiness

今日热心研友