Heterogeneous graph contrastive learning with adaptive data augmentation for semi‐supervised short text classification

计算机科学 人工智能 图形 机器学习 标记数据 模式识别(心理学) 数据挖掘 理论计算机科学
作者
Mingqiang Wu,Zhuoming Xu,Lei Zheng
出处
期刊:Expert Systems [Wiley]
标识
DOI:10.1111/exsy.13744
摘要

Abstract Short text classification has been widely used in many fields. Due to the scarcity of labelled data, implementing short text classification under semi‐supervised learning setting has become increasingly popular. Semi‐supervised short text classification methods based on graph neural networks can achieve state‐of‐the‐art classification performance by utilizing the expressive power of graph neural networks. However, these methods usually fail to mine the hidden patterns of a large amount of short text node data in the graph to optimize the short text node embeddings, which limits the semantic representation power of the short texts, thus leading to suboptimal classification performance. To overcome the limitation, this paper proposes a novel semi‐supervised short text classification method called the Heterogeneous Graph Contrastive Learning with Adaptive Data Augmentation (HGCLADA). In the knowledge bases guided soft prompt‐based data augmentation component, the related words of the tag words are used to optimize the soft prompts for generating diverse augmented samples. In the heterogeneous graph contrastive learning framework component, a heterogeneous graph that is constructed using short texts and keywords and an effective edge augmentation scheme based on a short text clustering algorithm are proposed. The optimized short text embeddings can be obtained to achieve the effective semi‐supervised short text classification. Extensive experiments on six benchmark datasets show that our HGCLADA method outperforms four classes of state‐of‐the‐art methods in terms of classification accuracy, especially with significant performance improvements of 8.74% on the TagMyNews dataset when each class only contains 20 labelled data.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
yy发布了新的文献求助10
刚刚
111完成签到,获得积分10
1秒前
兴奋冷风完成签到,获得积分10
1秒前
yy发布了新的文献求助10
1秒前
传奇3应助甜美孤云采纳,获得10
1秒前
2秒前
shoemaker发布了新的文献求助10
2秒前
yy发布了新的文献求助10
2秒前
没有稗子完成签到 ,获得积分10
2秒前
2秒前
yy发布了新的文献求助10
2秒前
2秒前
2秒前
星猫小帅哥完成签到,获得积分10
3秒前
完美世界应助九川采纳,获得10
3秒前
yy发布了新的文献求助10
3秒前
缺水哥完成签到,获得积分10
3秒前
yy发布了新的文献求助10
4秒前
心碎的黄焖鸡完成签到 ,获得积分10
4秒前
你好完成签到,获得积分10
4秒前
yy发布了新的文献求助10
4秒前
知行合一发布了新的文献求助50
4秒前
5秒前
翟如风发布了新的文献求助10
5秒前
yy发布了新的文献求助10
5秒前
轩辕白易发布了新的文献求助10
5秒前
yy发布了新的文献求助10
6秒前
ccc发布了新的文献求助10
6秒前
Jervis发布了新的文献求助10
7秒前
缺水哥发布了新的文献求助10
7秒前
Yrzyc应助科研之路采纳,获得10
7秒前
8秒前
8秒前
深蓝之株123完成签到,获得积分10
8秒前
会科研的胡萝卜完成签到,获得积分10
9秒前
ZDddd完成签到,获得积分10
9秒前
noah完成签到,获得积分10
9秒前
汉堡包应助人语采纳,获得10
10秒前
11秒前
猪猪hero发布了新的文献求助10
11秒前
高分求助中
Overcoming Stigma and Bias in Obesity Management 800
Malcolm Fraser : a biography 700
Signals, Systems, and Signal Processing 610
Materials selection in mechanical design 500
Bounds for Statistical Estimation in Semiparametric Models 500
Climate change and sports: Statistics report on climate change and sports 500
Forced degradation and stability indicating LC method for Letrozole: A stress testing guide 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6477427
求助须知:如何正确求助?哪些是违规求助? 8279331
关于积分的说明 17656998
捐赠科研通 5559556
什么是DOI,文献DOI怎么找? 2910834
邀请新用户注册赠送积分活动 1887790
关于科研通互助平台的介绍 1741254