医学
医学诊断
诊断准确性
模态(人机交互)
医学影像学
鉴别诊断
病变
放射科
病理
人工智能
计算机科学
作者
F Hassanein,Ahmed El Barbary,Radwa R. Hussein,Yousra Ahmed,Jylan El‐Guindy,Susan Sarhan,Asmaa Abou‐Bakr
摘要
ABSTRACT Background AI models like ChatGPT‐4o and DeepSeek‐3 show diagnostic promise, but their reliability in complex, image‐based oral lesions remains unclear. This study aimed to evaluate and compare the diagnostic accuracy of ChatGPT‐4o and DeepSeek‐3 despite their differing modalities against oral medicine (OM) experts across varied lesion types and case difficulty levels. Methods Eighty standardized clinical vignettes derived from real‐world oral disease cases, including clinical images/radiographs, were evaluated. Differential diagnoses were generated by ChatGPT‐4o, DeepSeek‐3, and four board‐certified OM specialists, with accuracy assessed at Top‐1, Top‐3, and Top‐5 levels. Results OM specialists consistently achieved the highest diagnostic accuracy. However, DeepSeek‐3 significantly outperformed ChatGPT‐4o at the Top‐3 level ( p = 0.0153) and showed greater robustness in high‐difficulty and inflammatory cases despite its text‐only modality. Multimodal imaging enhanced diagnostic accuracy. Regression analysis indicated lesion type and imaging modality as positive predictors, while diagnostic difficulty negatively impacted Top‐1 performance. Conclusions Remarkably, the text‐only DeepSeek‐3 model exceeded the diagnostic performance of the multimodal ChatGPT‐4o model for complex oral lesions, highlighting its structured reasoning capabilities and reduced hallucination rate. These findings underscore the potential of non‐vision LLMs in diagnostic support, emphasizing the critical need for expert oversight in complex scenarios.
科研通智能强力驱动
Strongly Powered by AbleSci AI