众包
推论
计算机科学
算法
人工智能
数据科学
万维网
作者
Xuan Wei,Mingyue Zhang,Qingpeng Zhang,Zhi Li,Daniel Zeng
出处
期刊:Informs Journal on Computing
日期:2025-09-15
标识
DOI:10.1287/ijoc.2023.0440
摘要
Crowdsourcing has become a pivotal strategy in gathering large-scale, high-quality labeled data, particularly in data-intensive applications powered by artificial intelligence. To aggregate the noisy crowd efforts, many studies have considered learning a predictive algorithm based on the noisy human annotations and subsequently integrating the learned knowledge back into the data aggregation process. However, it is unclear how to design such hybrid systems that maximize the complementary strengths of humans and algorithms. In response, we analyze the patterns of human and algorithm intelligence and propose that the inductive bias of algorithms can effectively mitigate inconsistencies in human labeling, thus complementing human efforts. Building on this premise, we propose a human-algorithm collaborative framework (HAC) to combine human labels with algorithmic predictions. By proposing a metric called hybrid complementarity score (HCS) to quantify human-algorithm complementarity, our framework can dynamically adjust the weight of each algorithm based on its complementarity, significantly enhancing the overall efficacy of the human-algorithm integration. To validate the effectiveness of our framework, we first instantiate it with several algorithms, including a high-complementarity algorithm building upon the inductive bias of clustering-aware design. We then benchmark our framework against leading baselines across eight real-world tasks. Our results not only demonstrate the superior performance of our proposed framework but also affirm its robustness across different algorithm selections (e.g., types and number of algorithms) and hyperparameter configurations. This research not only delivers a feasible and effective solution for truth inference in crowdsourcing but also contributes to the burgeoning community of human-algorithm collaboration. History: Accepted by Ram Ramesh, Area Editor for Data Science and Machine Learning. Funding: X. Wei is supported by the National Natural Science Foundation of China (NSFC) [Grants 72201167, 72192822, 72571175, 72331006, 72221001, and 72232005] and the Young Elite Scientists Sponsorship Program by CAST [Grant 2023QNRC001]. D. D. Zeng is supported by NSFC [Grant 72293575]. M. Zhang is supported by NSFC [Grant 72272101]. Q. Zhang is supported by the General Research Fund of the Research Grant Council of Hong Kong [Grant 17209225]. X. Wei also thanks the Science and Technology Commission of Shanghai Municipality [Grant 22JC1403600]. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2023.0440 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2023.0440 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .
科研通智能强力驱动
Strongly Powered by AbleSci AI