计算机科学
阅读理解
可读性
叙述的
背景(考古学)
阅读(过程)
人工智能
语言学
古生物学
生物
哲学
程序设计语言
作者
Lisa Schmitz,Philipp Sonnleitner
标识
DOI:10.1186/s40536-025-00255-w
摘要
Abstract Background The increasing capabilities of generative artificial intelligence (AI), exemplified by OpenAI’s transformer-based language model GPT-4 (ChatGPT), have drawn attention to its application in educational contexts. This study evaluates the potential of such models in generating German reading comprehension texts for educational large-scale assessments, within the multilingual context of Luxembourg. Addressing the challenges faced by item developers in sourcing or manually developing numerous suitable texts, the study aims to determine if ChatGPT can assist text creation while maintaining high-quality standards. Methods The study employed a mixed-methods approach. In a qualitative focus group discussion, experts identified the strengths, weaknesses, opportunities and threats (SWOT) of using GPT-4 for text generation. These insights informed the construction of a Text Analysis Cognitive Model (TACM), which served as theoretical foundation. Narrative and informative reading comprehension texts were then generated using two distinct prompt engineering techniques, derived from original passages and TACM specifications. In a blinded online review, N = 89 participants evaluated human-written and AI-generated texts with regard to their readability, correctness, coherence, engagement and adequacy for reading assessment. Results All administered texts were of similarly high quality, with reviewers being unable to consistently identify authorship origins. Quantitative evaluations indicated that one-shot prompts are effective for creating high-quality informative texts, whereas human-written texts remain superior for narratives. Zero-shot prompts offer considerable flexibility and creativity, but still require human refinement. Conclusion These findings offer promising first insights into GPT-4’s capacity to emulate human-written texts which can be used in the large-scale assessment context. The considerable potential of using generative AI-models as a flexible and efficacious assistant in the creation of reading comprehension texts is highlighted. Still, the necessity of human oversight is emphasized through an augmented intelligence-driven perspective. Given the jurisdictional framework of the European Union, an effective implementation of ChatGPT in the test development process remains hypothetical at this time but is likely to change.
科研通智能强力驱动
Strongly Powered by AbleSci AI