Scientific Writing in the Era of Large Language Models: A Computational Analysis of AI Versus Human-Created Content.

医学 内容(测量理论) 自然语言处理 语言学 数学 计算机科学 数学分析 哲学
作者
Rohan Khera,Aline F Pedroso,Vipina K. Keloth,Hua Xu,Gisele Sampaio Silva,Lee H. Schwamm
出处
期刊:PubMed
标识
DOI:10.1161/strokeaha.125.051913
摘要

Large language models (LLMs) are artificial intelligence (AI) tools that can generate human expert-like content and be used to accelerate the synthesis of scientific literature, but they can spread misinformation by producing misleading content. This study sought to characterize distinguishing linguistic features in differentiating AI-generated from human-authored scientific text and evaluate the performance of AI detection tools for this task. We conducted a computational synthesis of 34 essays on cerebrovascular topics (12 generated by large language models [Generative Pre-trained Transformer 4, Generative Pre-trained Transformer 3.5, Llama-2, and Bard] and 22 by human scientists). Each essay was rated as AI-generated or human-authored by up to 38 members of the Stroke editorial board. We compared the collective performance of experts versus GPTZero, a widely used online AI detection tool. We extracted and compared linguistic features spanning syntax (word count, complexity, and so on), semantics (polarity), readability (Flesch scores), grade level (Flesch-Kincaid), and language perplexity (or predictability) to characterize linguistic differences between AI-generated versus human-written content. Over 50% of the stroke experts who reviewed the study essays correctly identified 10 (83.3%) of AI-generated essays as AI, whereas they misclassified 7 (31.8%) of human-written essays as AI. GPTZero accurately classified 12 (100%) of AI-generated and 21 (95.5%) of human-written essays. However, the tool relied on only a few key sentences for classification. Compared with human essays, AI-generated content had lower word count and complexity, exhibited significantly lower perplexity (median, 15.0 versus 7.2; P<0.001), lower readability scores (Flesch median, 42.1 versus 26.4; P<0.001), and higher grade level (Flesch-Kincaid median, 13.1 versus 14.8; P=0.006). Large language models generate scientific content with measurable differences versus human-written text but represent features that are not consistently identifiable even by human experts and require complex AI detection tools. Given the challenges that experts face in distinguishing AI from human content, technology-assisted tools are essential wherever human provenance is essential to safeguard the integrity of scientific communication.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
elfff发布了新的文献求助10
刚刚
临澈完成签到 ,获得积分10
1秒前
1秒前
1秒前
梧桐细雨完成签到,获得积分10
1秒前
火花发布了新的文献求助10
2秒前
Majiko完成签到,获得积分10
2秒前
无情向南发布了新的文献求助10
2秒前
3秒前
3秒前
3秒前
王雪发布了新的文献求助10
4秒前
Sir.夏季风发布了新的文献求助10
4秒前
4秒前
4秒前
汉堡包应助Wellbeing采纳,获得10
5秒前
橙子完成签到,获得积分10
5秒前
Amy发布了新的文献求助10
7秒前
7秒前
qiming发布了新的文献求助10
8秒前
joy发布了新的文献求助10
8秒前
9秒前
龙龙ff11_发布了新的文献求助10
9秒前
香蕉觅云应助直率沂采纳,获得10
9秒前
橙子发布了新的文献求助10
9秒前
晴天发布了新的文献求助10
10秒前
量子星尘发布了新的文献求助10
11秒前
12秒前
范断秋发布了新的文献求助10
12秒前
不安钢铁侠完成签到,获得积分10
12秒前
13秒前
辣椒完成签到,获得积分10
14秒前
Maestro_S发布了新的文献求助10
14秒前
正直的夏真完成签到 ,获得积分10
15秒前
15秒前
格鲁特完成签到,获得积分20
16秒前
18秒前
华仔应助wuxunxun2015采纳,获得10
19秒前
20秒前
伶俐的悒发布了新的文献求助10
20秒前
高分求助中
(禁止应助)【重要!!请各位详细阅读】【科研通的精品贴汇总】 10000
Plutonium Handbook 4000
International Code of Nomenclature for algae, fungi, and plants (Madrid Code) (Regnum Vegetabile) 1500
Functional High Entropy Alloys and Compounds 1000
Building Quantum Computers 1000
Social Epistemology: The Niches for Knowledge and Ignorance 500
Principles of Plasma Discharges and Materials Processing,3rd Edition 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 4226360
求助须知:如何正确求助?哪些是违规求助? 3759671
关于积分的说明 11818516
捐赠科研通 3420928
什么是DOI,文献DOI怎么找? 1877572
邀请新用户注册赠送积分活动 930810
科研通“疑难数据库(出版商)”最低求助积分说明 838805