Accuracy and Reproducibility of ChatGPT Responses to Breast Cancer Tumor Board Patients
一致性
乳腺癌
医学
再现性
癌症
肿瘤科
内科学
医学物理学
统计
数学
作者
Ning Liao,Cheukfai Li,William J. Gradishar,V. Suzanne Klimberg,Joshua Roshal,Tai-Ze Yuan,Sanjiv S. Agarwala,V Valero,Sandra M. Swain,Julie A. Margenthaler,Isabel T. Rubio,Sara A. Hurvitz,Charles E. Geyer,Nancy U. Lin,Hope S. Rugo,Guochun Zhang,N. Liu,Charles M. Balch
PURPOSE We assessed the accuracy and reproducibility of Chat Generative Pre-Trained Transformer's (ChatGPT) recommendations in response to breast cancer patients by comparing generated outputs with consensus expert opinions. METHODS 362 consecutive breast cancer patients sourced from a weekly international breast cancer webinar series were submitted to a tumor board of renowned experts. The same 362 clinical patients were also prompted to ChatGPT-4.0 three separate times to examine reproducibility. RESULTS Only 46% of ChatGPT-generated content was entirely concordant with the recommendations of breast cancer experts, and only 39% of ChatGPT's responses demonstrated inter-response similarity. ChatGPT's responses demonstrated higher concordance with CEN experts in earlier stages of breast cancer (0, I, II, III) compared to advanced (IV) patients ( P = .019). There were less accurate responses from ChatGPT when responding to patients involving molecular markers and genetic testing ( P = .025), and in patients involving antibody drug conjugates ( P = .006). ChatGPT's responses were not necessarily incorrect but often omitted specific details about clinical management. When the same prompt was independently sent to CEN into the model on three occasions, each time by difference users, ChatGPT's responses exhibited variable content and formatting in 68% (246 out of 362) of patients and were entirely consistent with one another in only 32% of responses. CONCLUSION Since this promising clinical decision-making support tool is widely used currently by physicians worldwide, it is important for the user to understand its limitations as currently constructed when responding to multidisciplinary breast cancer patients, and for researchers in the field to continue improving its ability with contemporary, accurate and complete breast cancer information. As currently constructed, ChatGPT is not engineered to generate identical outputs to the same input and was less likely to correctly interpret and recommend treatments for complex breast cancer patients.