Accuracy and Reproducibility of ChatGPT Responses to Breast Cancer Tumor Board Patients

一致性乳腺癌医学再现性癌症肿瘤科内科学医学物理学统计数学

作者

Ning Liao,Cheukfai Li,William J. Gradishar,V. Suzanne Klimberg,Joshua Roshal,Tai-Ze Yuan,Sanjiv S. Agarwala,V Valero,Sandra M. Swain,Julie A. Margenthaler,Isabel T. Rubio,Sara A. Hurvitz,Charles E. Geyer,Nancy U. Lin,Hope S. Rugo,Guochun Zhang,N. Liu,Charles M. Balch

出处

期刊：JCO clinical cancer informatics [Lippincott Williams & Wilkins]
日期：2025-06-01 卷期号： (9)

链接

nih.govdoi.org

标识

DOI：10.1200/cci-25-00001

摘要

PURPOSE We assessed the accuracy and reproducibility of Chat Generative Pre-Trained Transformer's (ChatGPT) recommendations in response to breast cancer patients by comparing generated outputs with consensus expert opinions. METHODS 362 consecutive breast cancer patients sourced from a weekly international breast cancer webinar series were submitted to a tumor board of renowned experts. The same 362 clinical patients were also prompted to ChatGPT-4.0 three separate times to examine reproducibility. RESULTS Only 46% of ChatGPT-generated content was entirely concordant with the recommendations of breast cancer experts, and only 39% of ChatGPT's responses demonstrated inter-response similarity. ChatGPT's responses demonstrated higher concordance with CEN experts in earlier stages of breast cancer (0, I, II, III) compared to advanced (IV) patients ( P = .019). There were less accurate responses from ChatGPT when responding to patients involving molecular markers and genetic testing ( P = .025), and in patients involving antibody drug conjugates ( P = .006). ChatGPT's responses were not necessarily incorrect but often omitted specific details about clinical management. When the same prompt was independently sent to CEN into the model on three occasions, each time by difference users, ChatGPT's responses exhibited variable content and formatting in 68% (246 out of 362) of patients and were entirely consistent with one another in only 32% of responses. CONCLUSION Since this promising clinical decision-making support tool is widely used currently by physicians worldwide, it is important for the user to understand its limitations as currently constructed when responding to multidisciplinary breast cancer patients, and for researchers in the field to continue improving its ability with contemporary, accurate and complete breast cancer information. As currently constructed, ChatGPT is not engineered to generate identical outputs to the same input and was less likely to correctly interpret and recommend treatments for complex breast cancer patients.

求助该文献

Accuracy and Reproducibility of ChatGPT Responses to Breast Cancer Tumor Board Patients

今日热心研友