Large Language Models' Ability to Assess Main Concepts in Story Retelling: A Proof-of-Concept Comparison of Human Versus Machine Ratings

概化理论 等级间信度 克朗巴赫阿尔法 计算机科学 皮尔逊积矩相关系数 失语症 积极倾听 心理学 Python(编程语言) 可靠性(半导体) 自然语言处理 人工智能 认知心理学 统计 发展心理学 心理测量学 评定量表 沟通 数学 操作系统 功率(物理) 物理 量子力学
作者
Jacquie Kurland,Vishnupriya Varadharaju,Anna Liu,Polly Stokes,Ankita Gupta,Marisa Hudspeth,Brendan T. O’Connor
出处
期刊:American Journal of Speech-language Pathology [American Speech–Language–Hearing Association]
卷期号:: 1-11
标识
DOI:10.1044/2025_ajslp-24-00400
摘要

Purpose: Despite an abundance of manual, labor-intensive discourse analysis methods, there remains a dearth of clinically convenient, psychometrically robust instruments to measure change in real-world communication in aphasia. The Brief Assessment of Transactional Success (BATS) addresses this gap while developing automated methods for analyzing story retelling discourse. This study investigated automation of main concept (MC) analysis of stories by comparing scores from three large language models (LLMs) to those of human raters. Method: After watching/listening to each of the eight short video/audio BATS stimuli and retelling each story, 96 persons with aphasia (PWA; n = 48 female) engaged in topic-constrained conversations over Zoom with 94 familiar and 107 unfamiliar conversation partners (CPs). CPs then retold each story as co-constructed during their conversations with PWA. Audio files from the resulting 1,760 story retells were transcribed using Python and AssemblyAI's speech-to-text application programming interface. Each MC was first scored by human raters for presence, accuracy, and completeness. Raters used a semiautomated application, MainConcept. For each transcript, an MC composite ratio score was obtained. We evaluated three state-of-the-art LLMs: two proprietary models, GPT-4 and GPT-4o, and one open-source model, Llama-3-70B. The interrater reliability between each LLM versus human MC scoring was assessed via the Pearson correlation coefficient and reliability coefficients based on the generalizability theory (G-theory). Results: The Pearson correlation coefficients indicate strong positive linear relationships between LLM and human MC scores. G-theory reliability coefficients also indicate reliable scoring between LLM and human scoring across the spectrum of participants and conditions. Conclusions: This promising proof-of-concept study affirms the reliability of three LLMs in evaluating BATS story retell MCs and justifies ongoing investigation into their use. Providing clinicians and clinical researchers with automated tools for analyzing discourse without the need for prohibitively labor-intensive manual scoring could be a paradigm shift, potentially revolutionizing the aphasia intervention landscape.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
思源应助bifo采纳,获得10
1秒前
1秒前
我我我完成签到,获得积分10
2秒前
小十完成签到,获得积分10
3秒前
3秒前
星辰大海应助馒头出来混采纳,获得10
3秒前
拼搏荠完成签到,获得积分10
3秒前
keleboys发布了新的文献求助10
5秒前
充电宝应助white采纳,获得10
6秒前
vee应助舒心青旋采纳,获得10
6秒前
小十发布了新的文献求助10
7秒前
科研通AI5应助skier采纳,获得30
8秒前
卡卡西应助科研狗采纳,获得30
8秒前
小小完成签到,获得积分10
8秒前
wsw发布了新的文献求助10
8秒前
巨星不吃辣完成签到,获得积分20
9秒前
9秒前
科研通AI5应助yu采纳,获得10
10秒前
10秒前
11秒前
11秒前
12秒前
ZZQ关闭了ZZQ文献求助
12秒前
bkagyin应助岑夜南采纳,获得10
12秒前
阿里院士完成签到,获得积分10
13秒前
苏幕遮发布了新的文献求助10
13秒前
含蓄的荔枝应助666采纳,获得25
14秒前
Milou发布了新的文献求助10
14秒前
14秒前
奋斗的蜗牛应助crazy采纳,获得10
14秒前
乐乐应助桌球有点蔡先生采纳,获得10
16秒前
火火完成签到,获得积分10
16秒前
keyanthrouth发布了新的文献求助10
17秒前
17秒前
18秒前
忘的澜完成签到,获得积分10
18秒前
潇洒的涵双完成签到,获得积分10
19秒前
可爱的函函应助zhengyang采纳,获得10
19秒前
CipherSage应助笑点低怀亦采纳,获得10
20秒前
高分求助中
Разработка метода ускоренного контроля качества электрохромных устройств 500
Chinesen in Europa – Europäer in China: Journalisten, Spione, Studenten 500
Arthur Ewert: A Life for the Comintern 500
China's Relations With Japan 1945-83: The Role of Liao Chengzhi // Kurt Werner Radtke 500
Two Years in Peking 1965-1966: Book 1: Living and Teaching in Mao's China // Reginald Hunt 500
Epigenetic Drug Discovery 500
超微粉体加工技术与应用 第三版 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3821001
求助须知:如何正确求助?哪些是违规求助? 3363912
关于积分的说明 10425953
捐赠科研通 3082336
什么是DOI,文献DOI怎么找? 1695505
邀请新用户注册赠送积分活动 815168
科研通“疑难数据库(出版商)”最低求助积分说明 769002