土耳其
医学
考试(生物学)
医学物理学
牙科教育
医学教育
人工智能
计算机科学
哲学
语言学
古生物学
生物
作者
Soner Şişmanoğlu,Belen Şirinoğlu Çapan
标识
DOI:10.1186/s12909-024-06389-9
摘要
AI-powered chatbots have spread to various fields including dental education and clinical assistance to treatment planning. The aim of this study is to assess and compare leading AI-powered chatbot performances in dental specialization exam (DUS) administered in Turkey and compare it with the best performer of that year. DUS questions for 2020 and 2021 were directed to ChatGPT-4.0 and Gemini Advanced individually. DUS questions were manually entered into AI-powered chatbot in their original form, in Turkish. The results obtained were compared with each other and the year's best performers. Candidates who score at least 45 points on this centralized exam are deemed to have passed and are eligible to select their preferred department and institution. The data was statistically analyzed using Pearson's chi-squared test (p < 0.05). ChatGPT-4.0 received 83.3% correct response rate on the 2020 exam, while Gemini Advanced received 65% correct response rate. On the 2021 exam, ChatGPT-4.0 received 80.5% correct response rate, whereas Gemini Advanced received 60.2% correct response rate. ChatGPT-4.0 outperformed Gemini Advanced in both exams (p < 0.05). AI-powered chatbots performed worse in overall score (for 2020: ChatGPT-4.0, 65,5 and Gemini Advanced, 50.1; for 2021: ChatGPT-4.0, 65,6 and Gemini Advanced, 48.6) when compared to overall scores of the best performer of that year (68.5 points for year 2020 and 72.3 points for year 2021). This poor performance also includes the basic sciences and clinical sciences sections (p < 0.001). Additionally, periodontology was the clinical specialty in which both AI-powered chatbots achieved the best results, the lowest performance was determined in the endodontics and orthodontics. AI-powered chatbots, namely ChatGPT-4.0 and Gemini Advanced, passed the DUS by exceeding the threshold score of 45. However, they still lagged behind the top performers of that year, particularly in basic sciences, clinical sciences, and overall score. Additionally, they exhibited lower performance in some clinical specialties such as endodontics and orthodontics.
科研通智能强力驱动
Strongly Powered by AbleSci AI