亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

How Does ChatGPT Use Source Information Compared With Google? A Text Network Analysis of Online Health Information

医学 信息来源(数学) 健康信息 情报检索 互联网 万维网 数据科学 医疗保健 统计 数学 计算机科学 经济 经济增长
作者
Oscar Shen,Jayanth Sairam Pratap,Xiang Li,Neal C. Chen,Abhiram R. Bhashyam
出处
期刊:Clinical Orthopaedics and Related Research [Ovid Technologies (Wolters Kluwer)]
卷期号:482 (4): 578-588
标识
DOI:10.1097/corr.0000000000002995
摘要

Background The lay public is increasingly using ChatGPT (a large language model) as a source of medical information. Traditional search engines such as Google provide several distinct responses to each search query and indicate the source for each response, but ChatGPT provides responses in paragraph form in prose without providing the sources used, which makes it difficult or impossible to ascertain whether those sources are reliable. One practical method to infer the sources used by ChatGPT is text network analysis. By understanding how ChatGPT uses source information in relation to traditional search engines, physicians and physician organizations can better counsel patients on the use of this new tool. Questions/purposes (1) In terms of key content words, how similar are ChatGPT and Google Search responses for queries related to topics in orthopaedic surgery? (2) Does the source distribution (academic, governmental, commercial, or form of a scientific manuscript) differ for Google Search responses based on the topic’s level of medical consensus, and how is this reflected in the text similarity between ChatGPT and Google Search responses? (3) Do these results vary between different versions of ChatGPT? Methods We evaluated three search queries relating to orthopaedic conditions: “What is the cause of carpal tunnel syndrome?,” “What is the cause of tennis elbow?,” and “Platelet-rich plasma for thumb arthritis?” These were selected because of their relatively high, medium, and low consensus in the medical evidence, respectively. Each question was posed to ChatGPT version 3.5 and version 4.0 20 times for a total of 120 responses. Text network analysis using term frequency–inverse document frequency (TF-IDF) was used to compare text similarity between responses from ChatGPT and Google Search. In the field of information retrieval, TF-IDF is a weighted statistical measure of the importance of a key word to a document in a collection of documents. Higher TF-IDF scores indicate greater similarity between two sources. TF-IDF scores are most often used to compare and rank the text similarity of documents. Using this type of text network analysis, text similarity between ChatGPT and Google Search can be determined by calculating and summing the TF-IDF for all keywords in a ChatGPT response and comparing it with each Google search result to assess their text similarity to each other. In this way, text similarity can be used to infer relative content similarity. To answer our first question, we characterized the text similarity between ChatGPT and Google Search responses by finding the TF-IDF scores of the ChatGPT response and each of the 20 Google Search results for each question. Using these scores, we could compare the similarity of each ChatGPT response to the Google Search results. To provide a reference point for interpreting TF-IDF values, we generated randomized text samples with the same term distribution as the Google Search results. By comparing ChatGPT TF-IDF to the random text sample, we could assess whether TF-IDF values were statistically significant from TF-IDF values obtained by random chance, and it allowed us to test whether text similarity was an appropriate quantitative statistical measure of relative content similarity. To answer our second question, we classified the Google Search results to better understand sourcing. Google Search provides 20 or more distinct sources of information, but ChatGPT gives only a single prose paragraph in response to each query. So, to answer this question, we used TF-IDF to ascertain whether the ChatGPT response was principally driven by one of four source categories: academic, government, commercial, or material that took the form of a scientific manuscript but was not peer-reviewed or indexed on a government site (such as PubMed). We then compared the TF-IDF similarity between ChatGPT responses and the source category. To answer our third research question, we repeated both analyses and compared the results when using ChatGPT 3.5 versus ChatGPT 4.0. Results The ChatGPT response was dominated by the top Google Search result. For example, for carpal tunnel syndrome, the top result was an academic website with a mean TF-IDF of 7.2. A similar result was observed for the other search topics. To provide a reference point for interpreting TF-IDF values, a randomly generated sample of text compared with Google Search would have a mean TF-IDF of 2.7 ± 1.9, controlling for text length and keyword distribution. The observed TF-IDF distribution was higher for ChatGPT responses than for random text samples, supporting the claim that keyword text similarity is a measure of relative content similarity. When comparing source distribution, the ChatGPT response was most similar to the most common source category from Google Search. For the subject where there was strong consensus (carpal tunnel syndrome), the ChatGPT response was most similar to high-quality academic sources rather than lower-quality commercial sources (TF-IDF 8.6 versus 2.2). For topics with low consensus, the ChatGPT response paralleled lower-quality commercial websites compared with higher-quality academic websites (TF-IDF 14.6 versus 0.2). ChatGPT 4.0 had higher text similarity to Google Search results than ChatGPT 3.5 (mean increase in TF-IDF similarity of 0.80 to 0.91; p < 0.001). The ChatGPT 4.0 response was still dominated by the top Google Search result and reflected the most common search category for all search topics. Conclusion ChatGPT responses are similar to individual Google Search results for queries related to orthopaedic surgery, but the distribution of source information can vary substantially based on the relative level of consensus on a topic. For example, for carpal tunnel syndrome, where there is widely accepted medical consensus, ChatGPT responses had higher similarity to academic sources and therefore used those sources more. When fewer academic or government sources are available, especially in our search related to platelet-rich plasma, ChatGPT appears to have relied more heavily on a small number of nonacademic sources. These findings persisted even as ChatGPT was updated from version 3.5 to version 4.0. Clinical Relevance Physicians should be aware that ChatGPT and Google likely use the same sources for a specific question. The main difference is that ChatGPT can draw upon multiple sources to create one aggregate response, while Google maintains its distinctness by providing multiple results. For topics with a low consensus and therefore a low number of quality sources, there is a much higher chance that ChatGPT will use less-reliable sources, in which case physicians should take the time to educate patients on the topic or provide resources that give more reliable information. Physician organizations should make it clear when the evidence is limited so that ChatGPT can reflect the lack of quality information or evidence.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
小二郎应助称心嫣娆采纳,获得10
2秒前
CharlotteBlue应助pugongying采纳,获得20
31秒前
秋雪瑶应助qiuxuan100采纳,获得20
45秒前
energyharvester完成签到 ,获得积分10
1分钟前
脑洞疼应助Joe采纳,获得20
1分钟前
丘比特应助yqc采纳,获得10
1分钟前
lxt819发布了新的文献求助100
2分钟前
韦老虎发布了新的文献求助10
2分钟前
2分钟前
韦老虎发布了新的文献求助10
2分钟前
2分钟前
3分钟前
邓布利多完成签到 ,获得积分10
3分钟前
半糖神仙发布了新的文献求助10
4分钟前
yqc发布了新的文献求助10
4分钟前
4分钟前
称心嫣娆发布了新的文献求助10
4分钟前
4分钟前
qiuxuan100发布了新的文献求助20
4分钟前
孙中华发布了新的文献求助10
4分钟前
半糖神仙完成签到 ,获得积分10
4分钟前
4分钟前
孙中华完成签到,获得积分10
5分钟前
qiuxuan100完成签到,获得积分10
5分钟前
充电宝应助dyfsj采纳,获得10
5分钟前
5分钟前
侯小菊发布了新的文献求助10
5分钟前
Andrewlabeth完成签到 ,获得积分10
6分钟前
Joe发布了新的文献求助20
6分钟前
欣喜破茧完成签到 ,获得积分10
6分钟前
lcs完成签到,获得积分10
6分钟前
Joe关闭了Joe文献求助
7分钟前
哈扎尔完成签到 ,获得积分10
7分钟前
lanxinyue完成签到,获得积分10
7分钟前
8分钟前
陶醉的蜜蜂完成签到,获得积分10
8分钟前
8分钟前
jennie完成签到 ,获得积分10
8分钟前
韦老虎发布了新的文献求助10
8分钟前
8分钟前
高分求助中
Teaching Social and Emotional Learning in Physical Education 900
Gymnastik für die Jugend 600
Chinese-English Translation Lexicon Version 3.0 500
Electronic Structure Calculations and Structure-Property Relationships on Aromatic Nitro Compounds 500
マンネンタケ科植物由来メロテルペノイド類の網羅的全合成/Collective Synthesis of Meroterpenoids Derived from Ganoderma Family 500
[Lambert-Eaton syndrome without calcium channel autoantibodies] 440
Plesiosaur extinction cycles; events that mark the beginning, middle and end of the Cretaceous 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2384333
求助须知:如何正确求助?哪些是违规求助? 2091268
关于积分的说明 5257866
捐赠科研通 1818144
什么是DOI,文献DOI怎么找? 906953
版权声明 559082
科研通“疑难数据库(出版商)”最低求助积分说明 484248