清晰
分级(工程)
计算机科学
元认知
认知
相关性(法律)
一致性(知识库)
人工智能
心理学
生物化学
化学
土木工程
神经科学
政治学
法学
工程类
作者
Kangkang Li,Qian Yang,Xianmin Yang
标识
DOI:10.1109/tlt.2024.3394807
摘要
Student-generated questions (SGQs) strategy is an effective instructional strategy for developing students' higher-order cognitive and critical thinking. However, assessing the quality of SGQs is time-consuming and domain experts intensive. Previous automatic evaluation work focused on surface-level features of questions. To overcome this limitation, the state-of-the-art language models GPT-3.5 and GPT-4.0 were used to evaluate 1084 SGQs for topic relevance, clarity of expression, answerability, challenging, and cognitive level. Results showed that GPT-4.0 exhibits superior grading consistency with experts compared to GPT-3.5 in terms of topic relevance, clarity of expression, answerability, and difficulty level. GPT-3.5 and GPT-4.0 had low consistency with experts in terms of cognitive level. Over three rounds of testing, GPT-4.0 demonstrated higher stability in auto-grading when contrasted with GPT-3.5. In addition, to validate the effectiveness of GPT in evaluating SGQs from different domains and subjects, we have done the same experiment on a part of LearningQ dataset. We also discussed the attitudes of teachers and students towards automatic grading by GPT models. The findings underscore the potential of GPT-4.0 to assist teachers in evaluating the quality of SGQs. Nevertheless, the cognitive level assessment of SGQs still needs manual examination by teachers.
科研通智能强力驱动
Strongly Powered by AbleSci AI