医学
谵妄
痴呆
可读性
血管性痴呆
重症监护医学
老年学
内科学
疾病
语言学
哲学
作者
Jamila Tukur Jido,Ahmed Al-Wizni,Su Aung
出处
期刊:Cureus
[Cureus, Inc.]
日期:2025-06-06
摘要
Background Large language models such as ChatGPT, DeepSeek, and Gemini are increasingly used to generate patient-facing medical content. While their factual accuracy has been explored, the readability of these outputs remains less well understood. Readability is a crucial component of health communication, particularly for older adults and those with lower health literacy. This study aimed to evaluate and compare the readability of patient information leaflets generated by three large language models - ChatGPT, DeepSeek, and Gemini - on the topics of Alzheimer's disease, vascular dementia, and delirium, using five validated readability metrics. Materials and methods We conducted a cross-sectional comparative study of patient information leaflets generated by three large language models on the topics of Alzheimer's disease, vascular dementia, and delirium. Each model was prompted using identical queries, and the resulting texts were evaluated using five established readability metrics: Flesch Reading Ease, Flesch-Kincaid Grade Level, Gunning Fog Index, Simple Measure of Gobbledygook (SMOG) Index, and Automated Readability Index. Readability scores were compared using Kruskal-Wallis tests to identify statistically significant differences between models. Results ChatGPT consistently produced the most readable content, with the highest Flesch Reading Ease scores and the lowest grade-level indices. DeepSeek generated text that was markedly more complex and less accessible. Gemini performed intermediately, sometimes matching ChatGPT in specific indices but not consistently across all metrics. The difference in Flesch Reading Ease scores between models was statistically significant (H = 7.20, p = 0.027). Other metrics showed trends that approached significance. Conclusions There are meaningful differences in the readability of patient information generated by different large language models. ChatGPT appears to produce content that is more suitable for patient understanding, particularly in the context of older adult care. These findings highlight the need for careful evaluation of readability when using generative AI in clinical communication. Future research should incorporate expert review of content accuracy and appropriateness alongside readability.
科研通智能强力驱动
Strongly Powered by AbleSci AI