概化理论
可用性
计算机科学
桥(图论)
数据科学
定性研究
人工智能
人机交互
研究设计
数据收集
心理学
桥接(联网)
健康信息学
自然语言处理
管理科学
转化研究
梅德林
知识管理
语言理解
定性性质
临床研究设计
过程管理
作者
Jiwon You,Hangsik Shin
标识
DOI:10.1093/jamia/ocaf230
摘要
BACKGROUND: Despite rapid integration into clinical decision-making, clinical large language models (LLMs) face substantial translational barriers due to insufficient structural characterization and limited external validation. OBJECTIVE: We systematically map the clinical LLM research landscape to identify key structural patterns influencing their readiness for real-world clinical deployment. METHODS: We identified 73 clinical LLM studies published between January 2020 and March 2025 using a structured evidence-mapping approach. To ensure transparency and reproducibility in study selection, we followed key principles from the PRISMA 2020 framework. Each study was categorized by clinical task, base architecture, alignment strategy, data type, language, study design, validation methods, and evaluation metrics. RESULTS: Studies often addressed multiple early stage clinical tasks-question answering (56.2%), knowledge structuring (31.5%), and disease prediction (43.8%)-primarily using text data (52.1%) and English-language resources (80.8%). GPT models favored retrieval-augmented generation (43.8%), and LLaMA models consistently adopted multistage pretraining and fine-tuning strategies. Only 6.9% of studies included external validation, and prospective designs were observed in just 4.1% of cases, reflecting significant gaps in translational reliability. Evaluations were predominantly quantitative only (79.5%), though qualitative and mixed-method approaches are increasingly recognized for assessing clinical usability and trustworthiness. CONCLUSION: Clinical LLM research remains exploratory, marked by limited generalizability across languages, data types, and clinical environments. To bridge this gap, future studies must prioritize multilingual and multimodal training, prospective study designs with rigorous external validation, and hybrid evaluation frameworks combining quantitative performance with qualitative clinical usability metrics.
科研通智能强力驱动
Strongly Powered by AbleSci AI