前列腺癌
放射治疗
医学
肿瘤科
医学物理学
癌症
计算机科学
内科学
作者
P. W. Luo,Jiwen Liu,Xin Xie,Jiawei Jiang,Xiaoguang Huo,Zhen-Lin Chen,Zhang-Cheng Huang,Shaoqin Jiang,Mengqiang Li
摘要
The medical information generated by large language models (LLM) is crucial for improving patient education and clinical decision-making. This study aims to evaluate the performance of two LLMs (DeepSeek and ChatGPT) in answering questions related to prostate cancer radiotherapy in both Chinese and English environments. Through a comparative analysis, we aim to determine which model can provide higher-quality answers in different language environments. A structured evaluation framework was developed using a set of clinically relevant questions covering three key domains: foundational knowledge, patient education, and treatment and follow-up care. Responses from DeepSeek and ChatGPT were generated in both English and Chinese and independently assessed by a panel of five oncology specialists using a five-point Likert scale. Statistical analyses, including the Wilcoxon signed-rank test, were performed to compare the models' performance across different linguistic contexts. This study ultimately included 33 questions for scoring. In Chinese, DeepSeek outperformed ChatGPT, achieving top ratings (score = 5) in 75.76% vs. 36.36% of responses (P < 0.001), excelling in foundational knowledge (76.92% vs. 38.46%, P = 0.047) and treatment/follow-up (81.82% vs. 36.36%, P = 0.031). In English, ChatGPT showed comparable performance (66.7% vs. 54.55% top-rated responses, P = 0.236), with marginal advantages in treatment/follow-up (63.64% vs. 54.55%, P = 0.563). DeepSeek maintained strengths in English foundational knowledge (69.23% vs. 30.77%, P = 0.047) and patient education (88.89% vs. 55.56%, P = 0.125). These findings underscore DeepSeek's superior Chinese proficiency and language-specific optimization impacts. This study shows that DeepSeek performs excellently in providing Chinese medical information, while the two models perform similarly in an English environment. These findings underscore the importance of selecting language-specific artificial intelligence (AI) models to enhance the accuracy and reliability of medical AI applications. While both models show promise in supporting patient education and clinical decision-making, human expert review remains necessary to ensure response accuracy and minimize potential misinformation.
科研通智能强力驱动
Strongly Powered by AbleSci AI