众包
推论
聚类分析
计算机科学
数据挖掘
集合(抽象数据类型)
基本事实
星团(航天器)
人工智能
机器学习
可靠性(半导体)
班级(哲学)
功率(物理)
物理
量子力学
万维网
程序设计语言
作者
Gongqing Wu,Liangzhu Zhou,Jiazhu Xia,Lei Li,Xianyu Bao,Xindong Wu
摘要
Truth inference can help solve some difficult problems of data integration in crowdsourcing. Crowdsourced workers are not experts and their labeling ability varies greatly; therefore, in practical applications, it is difficult to determine whether the labels collected from a crowdsourcing platform are correct. This article proposes a novel algorithm called truth inference based on label confidence clustering (TILCC) to improve the quality of integrated labels for the single-choice classification problem in crowdsourcing labeling tasks. We obtain the label confidence via worker reliability, which is calculated from multiple noise labels using a truth discovery method, and then we generate the clustering features and use the K-means algorithm to cluster all the tasks into K different clusters. Each cluster corresponds to a specific class, and the tasks in the cluster are assigned a label. Compared with the performances of six state-of-the-art methods, MV, ZenCrowd, PM, CATD, GLAD, and GTIC, on 12 randomly selected real-world datasets, the performance of our algorithm showed many advantages: no need to set complex parameters, faster running speed, and significantly higher accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI