计算机科学
人工智能
人工神经网络
机器学习
最大化
期望最大化算法
概率逻辑
水准点(测量)
噪音(视频)
基本事实
模式识别(心理学)
数据挖掘
最大似然
大地测量学
图像(数学)
经济
微观经济学
统计
地理
数学
作者
Junfan Chen,Richong Zhang,Jie Xu,Chunming Hu,Yongyi Mao
标识
DOI:10.1109/tkde.2022.3223067
摘要
Multi-label text classification (MLTC) has a wide range of real-world applications. Neural networks recently promoted the performance of MLTC models. Training these neural-network models relies on sufficient accurately labelled data. However, manually annotating large-scale multi-label text classification datasets is expensive and impractical for many applications. Weak supervision techniques have thus been developed to reduce the cost of annotating text corpus. However, these techniques introduce noisy labels into the training data and may degrade the model performance. This paper aims to deal with such noise-label problems in MLTC in both single-instance and multi-instance settings. We build a novel Neural Expectation-Maximization Framework (nEM) that combines neural networks with probabilistic modelling. The nEM framework produces text representations using neural-network text encoders and is optimized with the Expectation-Maximization algorithm. It naturally considers the noisy labels during learning by iteratively updating the model parameters and estimating the distribution of the ground-truth labels. We evaluate our nEM framework in multi-instance noisy MLTC on a benchmark relation extraction dataset constructed by distant supervision and in single-instance noisy MLTC on synthetic noisy datasets constructed by keywords supervision and label flipping. The experimental results demonstrate that nEM significantly improves upon baseline models in both single-instance and multi-instance noisy MLTC tasks. The experiment analysis suggests that our nEM framework efficiently reduces the noisy labels in MLTC datasets and significantly improves model performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI