分级(工程)
过程(计算)
计算机科学
心理学
数学教育
过程管理
医学教育
工程类
程序设计语言
医学
土木工程
标识
DOI:10.1016/j.tsc.2024.101522
摘要
This research evaluated ChatGPT's potential as a tool for grading programming tasks, exploring its capability to understand and assess code quality. The study took place over a 15-week Python programming course with 67 students of the Cognitive Science program. Nine different assignments were assessed by both a teacher and the ChatGPT system, and the grading differences were recorded. The teacher's grades were higher than those generated by ChatGPT. Despite this, there was a strong positive correlation between these grades, suggesting consensus agreement in grading. Nonetheless, the repeatability of ChatGPT's evaluations was excellent, and the observed differences in successive evaluations during grading iterations were negligible. The study concludes that ChatGPT could be a beneficial tool for grading programming assignments, providing several advantages such as time efficiency, quality assessment, unbiased grading, enforcement of coding standards, and the ability to generate feedback. However, the system has limitations such as cost, potential hallucinations, lack of absolute agreement reproducible results, and the occasional need for teacher intervention. The study suggests that the artificial intelligence model could complement or even substitute human grading but requires careful usage and potential verification by a human teacher.
科研通智能强力驱动
Strongly Powered by AbleSci AI