计算机科学
自然语言处理
人工智能
分级(工程)
判决
可读性
多义
文字嵌入
阅读(过程)
语言模型
语音识别
嵌入
语言学
工程类
哲学
土木工程
程序设计语言
作者
Yuchen Wang,Juxiang Zhou,Zijie Li,Shu Zhang,Xiaoyu Han
标识
DOI:10.1109/tlt.2023.3319582
摘要
Graded reading is one of the important ways of English learning. How to automatically judge and grade the difficulty of the English reading corpus is of great significance for precision teaching and personalized learning. However, the current rule-based readability assessment methods have some limitations, such as low efficiency and poor accuracy. In particular, these traditional methods usually lack semantics, which is crucial for students to understand the reading material. Meanwhile, they are difficult to be mapped to the difficulty level, which is not conducive to flexible application in actual personalized teaching. In this study, a method for grading the difficulty of the English reading corpus is proposed. This approach makes use of a pretrained language model and feature fusion embedding to make the most of multifeature data when training. First, based on linguists' evaluations of the variables influencing the difficulty of English reading corpus, three primary statistical features—sentence length, word length, and the number of prepositions—are taken into consideration. Then, the semantic features and part-of-speech features of the text are learned by a pretrained language model and long short-term memory, respectively, to capture polysemy features and fine-grained semantic representations that are difficult to represent with traditional models. Finally, multifeature embedding extractions are fused to grade the difficulty of the English reading corpus. Extensive experimental comparisons on a self-built dataset and two datasets that are freely accessible with various models indicate that our method outperforms the others in the task of grading the difficulty of English reading corpora.
科研通智能强力驱动
Strongly Powered by AbleSci AI