Evaluating the Effectiveness of Artificial Intelligence–powered Large Language Models Application in Disseminating Appropriate and Readable Health Information in Urology

传播人工智能数据科学计算机科学电信

作者

Ryan J. Davis,Michael Eppler,Oluwatobiloba Ayo‐Ajibola,Jeffrey Loh-Doyle,Jamal Nabhani,Mary K. Samplaski,Inderbir S. Gill,Giovanni Cacciamani

出处

期刊：The Journal of Urology [Lippincott Williams & Wilkins]
日期：2023-07-10 卷期号：210 (4): 688-694 被引量：64

链接

nih.govdoi.org

标识

DOI：10.1097/ju.0000000000003615

摘要

No AccessJournal of UrologyNew Technology and Techniques1 Oct 2023Evaluating the Effectiveness of Artificial Intelligence–powered Large Language Models Application in Disseminating Appropriate and Readable Health Information in Urology Ryan Davis, Michael Eppler, Oluwatobiloba Ayo-Ajibola, Jeffrey C. Loh-Doyle, Jamal Nabhani, Mary Samplaski, Inderbir Gill, and Giovanni E. Cacciamani Ryan DavisRyan Davis https://orcid.org/0009-0002-0408-8380 USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California , Michael EpplerMichael Eppler https://orcid.org/0000-0001-6336-5857 USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California , Oluwatobiloba Ayo-AjibolaOluwatobiloba Ayo-Ajibola USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California , Jeffrey C. Loh-DoyleJeffrey C. Loh-Doyle https://orcid.org/0000-0002-7094-482X USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California , Jamal NabhaniJamal Nabhani USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California , Mary SamplaskiMary Samplaski USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California , Inderbir GillInderbir Gill USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California , and Giovanni E. CacciamaniGiovanni E. Cacciamani *Correspondence: Catherine and Joseph Aresty Department of Urology, University of Southern California,1441 Eastlake Ave, Los Angeles, CA 90033 telephone: 626-491-1531; E-mail Address: [email protected] USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California View All Author Informationhttps://doi.org/10.1097/JU.0000000000003615AboutFull TextPDF ToolsAdd to favoritesDownload CitationsTrack CitationsPermissionsReprints ShareFacebookLinked InTwitterEmail Abstract Purpose: The Internet is a ubiquitous source of medical information, and natural language processors are gaining popularity as alternatives to traditional search engines. However, suitability of their generated content for patients is not well understood. We aimed to evaluate the appropriateness and readability of natural language processor-generated responses to urology-related medical inquiries. Materials and Methods: Eighteen patient questions were developed based on Google Trends and were used as inputs in ChatGPT. Three categories were assessed: oncologic, benign, and emergency. Questions in each category were either treatment or sign/symptom-related questions. Three native English-speaking Board-Certified urologists independently assessed appropriateness of ChatGPT outputs for patient counseling using accuracy, comprehensiveness, and clarity as proxies for appropriateness. Readability was assessed using the Flesch Reading Ease and Flesh-Kincaid Reading Grade Level formulas. Additional measures were created based on validated tools and assessed by 3 independent reviewers. Results: Fourteen of 18 (77.8%) responses were deemed appropriate, with clarity having the most 4 and 5 scores (P = .01). There was no significant difference in appropriateness of the responses between treatments and symptoms or between different categories of conditions. The most common reason from urologists for low scores was responses lacking information—sometimes vital information. The mean (SD) Flesch Reading Ease score was 35.5 (SD=10.2) and the mean Flesh-Kincaid Reading Grade Level score was 13.5 (1.74). Additional quality assessment scores showed no significant differences between different categories of conditions. Conclusions: Despite impressive capabilities, natural language processors have limitations as sources of medical information. Refinement is crucial before adoption for this purpose. REFERENCES 1. . Digital Around the World. 2023. https://datareportal.com/global-digital-overview Google Scholar 2. . Odds of talking to healthcare providers as the initial source of healthcare information: updated cross-sectional results from the Health Information National Trends Survey (HINTS). BMC Fam Pract. 2018; 19(1):146-149. Crossref, Medline, Google Scholar 3. . 47 Google Search Statistics of 2023 (Popular Searches and Usage). 2023. https://www.demandsage.com/google-search-statistics/ Google Scholar 4. OpenAI. ChatGPT: Optimizing Language Models for Dialogue. 2022. https://openai.com/blog/chatgpt Google Scholar 5. . ChatGPT Sets Record for Fastest-Growing User Base—Analyst Note. 2023. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/ Google Scholar 6. . Is Google's Reign Over? ChatGPT Emerges As A Serious Competitor. 2023. https://www.forbes.com/sites/bernardmarr/2023/02/20/is-googles-reign-over-chatgpt-emerges-as-a-serious-competitor/?sh=40d534e11072 Google Scholar 7. American College of Surgeons. Urology. 2023. https://www.facs.org/for-medical-professionals/education/programs/so-you-want-to-be-a-surgeon/section-iii-surgical-specialties/urology/ Google Scholar 8. . Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA. 2023; 329(10):842. Crossref, Medline, Google Scholar 9. . Consulting ‘Dr. Google’ for minimally invasive urological oncological surgeries: a contemporary web-based trend analysis. Int J Med Robot. 2021; 17(4):e2250. Crossref, Medline, Google Scholar 10. . Consulting “Dr. Google” for prostate cancer treatment options: a contemporary worldwide trend analysis. Eur Urol Oncol. 2020; 3(4):481-488. Crossref, Medline, Google Scholar 11. Google. Google Trends. 2023. www.google.com/trends Google Scholar 12. . Urologic Diseases in America Project: analytical methods and principal findings. J Urol. 2005; 173(3):933-937. Link, Google Scholar 13. . Understanding the social mechanism of cancer misinformation spread on YouTube and lessons learned: infodemiological study. J Med Internet Res. 2022; 24(11):e39571. Crossref, Medline, Google Scholar 14. . Online health information impacts patients' decisions to seek emergency department care. West J Emerg Med. 2011; 12(2):174-177. Medline, Google Scholar 15. . The QUEST for quality online health information: validation of a short quantitative tool. BMC Med Inform Decis Mak. 2018; 18:87-15. Crossref, Medline, Google Scholar 16. . DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health. 1999; 53(2):105-111. Crossref, Medline, Google Scholar 17. . How accurate are digital symptom assessment apps for suggesting conditions and urgency advice? A clinical vignettes comparison to GPs. BMJ Open. 2020; 10(12):e040269. Crossref, Medline, Google Scholar 18. . A new readability yardstick. J Appl Psychol. 1948; 32(3):221-233. Crossref, Medline, Google Scholar 19. . Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. Naval Technical Training Command Millington TN Research Branch; 1975. Crossref, Google Scholar 20. . Analyzing and interpreting data from Likert-type scales. J Graduate Med Educ. 2013; 5(4):541-542. Crossref, Medline, Google Scholar 21. . Guidelines for reporting of statistics for clinical research in urology. J Urol. 2019; 201(3):595-604. Link, Google Scholar 22. . Guidelines for reporting of figures and tables for clinical research in urology. J Urol. 2020; 204(1):121-133. Link, Google Scholar 23. . Health information on the Internet: gold mine or minefield?. Can Fam Physician Med. 2014; 60(5):407-408. Medline, Google Scholar 24. . Urinary retention in adults: diagnosis and initial management. Am Fam Physician. 2008; 77(5):643-650. Medline, Google Scholar 25. . Asking “Dr. Google” for a second opinion: the devil is in the details. Eur Urol Focus. 2021; 7(2):479-481. Crossref, Medline, Google Scholar 26. . Improving comprehension for cancer patients with low literacy skills: strategies for clinicians. CA Cancer J Clin. 1998; 48(3):151-162. Crossref, Medline, Google Scholar 27. . Health Literacy: A Manual for Clinicians. American Medical Association; 2003. Google Scholar 28. . Readability of patient education materials: implications for clinical practice. Appl Nurs Res. 1996; 9(3):139-143. Crossref, Medline, Google Scholar 29. . Assessing, controlling, and assuring the quality of medical information on the Internet: caveant lector et viewor—let the reader and viewer beware. JAMA. 1997; 277(15):1244-1245. Crossref, Medline, Google Scholar 30. . Chatbot for health care and oncology applications using artificial intelligence and machine learning: systematic review. JMIR Cancer. 2021; 7(4):e27850. Crossref, Medline, Google Scholar 31. . Urology and the Internet: an evaluation of Internet use by urology patients and of information available on urological topics. BJU Int. 2000; 86(3):191-194. Crossref, Medline, Google Scholar 32. . Revolutionizing Healthcare: The Top 14 Uses Of ChatGPT In Medicine And Wellness. 2023. https://www.forbes.com/sites/bernardmarr/2023/03/02/revolutionizing-healthcare-the-top-14-uses-of-chatgpt-in-medicine-and-wellness/?sh=44a8053b6e54 Google Scholar 33. Pew Research Center. Survey. The Internet and Health 2009. 2013. https://www.pewresearch.org/internet/2013/02/12/the-internet-and-health/ Google Scholar 34. . Is “Movember” an effective prostate cancer awareness campaign beyond the English language? Insights from Google Trends among Spanish speakers. Soc Int Urol J. 2021; 2(6):362-369. Google Scholar 35. . Consulting "Dr Google" for sexual dysfunction: a contemporary worldwide trend analysis. Int J Impot Res. 2020; 32(4):455-461. Crossref, Medline, Google Scholar 36. . Web search queries and prostate cancer. Lancet Oncol. 2020; 21(4):494-496. Crossref, Medline, Google Scholar 37. . Cancer awareness crusades-pink ribbons and growing moustaches. Lancet Oncol. 2019; 20(11):1491-1492. Crossref, Medline, Google Scholar Support: None. Conflict of Interest: Inderbir Gill: Oneline Health: Equity. The remaining Authors have no conflicts of interest to disclose. Ethics Statement: All human subjects provided written informed consent with guarantees of confidentiality. © 2023 by American Urological Association Education and Research, Inc.FiguresReferencesRelatedDetailsCited byRegala J and Siemens D (2023) Who Is an Author? Finding the Balance Between Contribution and AccountabilityJournal of Urology, VOL. 210, NO. 6, (830-832), Online publication date: 1-Dec-2023.Cacciamani G, Siemens D and Gill I (2023) Generative Artificial Intelligence in Health CareJournal of Urology, VOL. 210, NO. 5, (723-725), Online publication date: 1-Nov-2023.Cacciamani G (2023) Evaluating the Effectiveness of Artificial Intelligence–powered Large Language Models Application in Disseminating Appropriate and Readable Health Information in Urology. Reply.Journal of Urology, VOL. 210, NO. 5, (736-737), Online publication date: 1-Nov-2023.Di H and Wen Y (2023) Evaluating the Effectiveness of Artificial Intelligence–powered Large Language Models Application in Disseminating Appropriate and Readable Health Information in Urology. Letter.Journal of Urology, VOL. 210, NO. 5, (735-736), Online publication date: 1-Nov-2023.Siemens D (2023) This Month in Adult UrologyJournal of Urology, VOL. 210, NO. 4, (573-574), Online publication date: 1-Oct-2023. Volume 210 Issue 4 October 2023 Page: 688-694 Supplementary Materials Peer Review Report Advertisement Copyright & Permissions© 2023 by American Urological Association Education and Research, Inc.Keywordsartificial intelligencecommunicationhealthurologysigns and symptomstherapeuticsMetrics Author Information Ryan Davis USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California More articles by this author Michael Eppler USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California More articles by this author Oluwatobiloba Ayo-Ajibola USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California More articles by this author Jeffrey C. Loh-Doyle USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California More articles by this author Jamal Nabhani USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California More articles by this author Mary Samplaski USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California More articles by this author Inderbir Gill USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California More articles by this author Giovanni E. Cacciamani USC Institute of Urology, and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, California AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, California *Correspondence: Catherine and Joseph Aresty Department of Urology, University of Southern California,1441 Eastlake Ave, Los Angeles, CA 90033 telephone: 626-491-1531; E-mail Address: [email protected] More articles by this author Expand All Support: None. Conflict of Interest: Inderbir Gill: Oneline Health: Equity. The remaining Authors have no conflicts of interest to disclose. Ethics Statement: All human subjects provided written informed consent with guarantees of confidentiality. Advertisement PDF downloadLoading ...

求助该文献

最长约 10秒，即可获得该文献文件

Evaluating the Effectiveness of Artificial Intelligence–powered Large Language Models Application in Disseminating Appropriate and Readable Health Information in Urology

今日热心研友