计算机科学
词(群论)
相似性(几何)
多样性(控制论)
主题模型
自然语言处理
背景(考古学)
聚类分析
人工智能
情报检索
社会化媒体
万维网
语言学
哲学
古生物学
图像(数学)
生物
作者
Ximing Li,Ang Zhang,Changchun Li,Lantian Guo,Wenting Wang,Jihong Ouyang
标识
DOI:10.1093/comjnl/bxy037
摘要
Short texts, such as Twitter social media posts, have become increasingly popular on the Internet. Inferring topics from massive numbers of short texts is important to many real-world applications. A single short text often contains a few words, making traditional topic models less effective. A recently developed biterm topic model (BTM) effectively models short texts by capturing the rich global word co-occurrence information. However, in the sparse short-text context, many highly related words may never co-occur. BTM may lose many potential coherent and prominent word co-occurrence patterns that cannot be observed in the corpus. To address this problem, we propose a novel relational BTM (R-BTM) model, which links short texts using a similarity list of words computed employing word embeddings. To evaluate the effectiveness of R-BTM, we compare it against the existing short-text topic models on a variety of traditional tasks, including topic quality, clustering and text similarity. Experimental results on real-world datasets indicate that R-BTM outperforms baseline topic models for short texts.
科研通智能强力驱动
Strongly Powered by AbleSci AI