Boosting(机器学习)
决策树
计算机科学
梯度升压
随机森林
剽窃检测
编码(集合论)
树(集合论)
人工智能
机器学习
数学
程序设计语言
数学分析
集合(抽象数据类型)
作者
Huang Qiubo,Tian Jingdong,Fang Guozheng
出处
期刊:International Conference on Data Mining
日期:2019-04-28
被引量:1
标识
DOI:10.1145/3335656.3335692
摘要
This paper studies the Online Judge System for assignments such as programming. Sometimes there are plagiarismsin codes submitted by students[1]. In addition to calculating the similarity degree between the codes, we also extract other features to determine whether there isplagiarismsuspicion of a submitted code or not. By using combination of Random Forest and Gradient Boosting Decision Tree, we also can getitssuspicion level. The model first calculates the similarity degree between the newly submitted code and all submitted codes, and determines plagiarism suspect. For some codes that are difficult to confirm whetherisplagiarismor not, we extract the programming style similarity degree, and the student's submission behavior pattern (such as similar target concentration degree) and other features, to create decision trees such as Random Forestand Gradient Boosting Decision Trees, which can help determine the level of plagiarism suspect. If the level is medium, the teacher will mark the code as plagiarized or not. Finally, the learning model is incrementally trained to improve the accuracy of the model and the classification results. Experiment results show that the accuracy rate can reach 95.9%. As a result, the model can prevent students from plagiarizing while minimizing the workload of the teacher.
科研通智能强力驱动
Strongly Powered by AbleSci AI