Identification of Traffic Information on Twitter Data using Topic Modeling and Entity Recognition

文字2vec 计算机科学潜在Dirichlet分配主题模型文字嵌入 tf–国际设计公司情绪分析词（群论）非负矩阵分解矢量化（数学）情报检索人工智能自然语言处理期限（时间）数据挖掘矩阵分解嵌入数学量子力学几何学物理特征向量并行计算

作者

Nuraisa Novia Hidayati,Putri Damayanti,Agus Zainal Arifin

出处

期刊：Jurnal Linguistik Komputasional [Tanjungpura University]
日期：2021-04-26 卷期号：4 (1): 1-7 被引量：1

链接

inacl.iddoi.org

标识

DOI：10.26418/jlk.v4i1.40

摘要

Tweet data on several official Twitter accounts from news portals can provide traffic information near real-time, which helps control smooth mobilization. However, the data is mixed with news on current issues, such as government policies and the pandemic situation. For this reason, a news grouping process is needed by finding word vectors through word embedding and inserting them into topic modeling to help separate traffic news from other news. We have compared two well-tested methods when processing Twitter data in various categories: Latent Dirichlet Allocation (LDA) and Non-negative matrix factorization (NMF). In previous research, it appears that the two methods still find the words that compose the topic are quite challenging to interpret. Therefore, we use Word2vec as input to compare the term frequency-inverse document frequency (TF-IDF), which is very commonly used. It is hoped that Word2vec has collected related words and, in turn, will result in a better division of topics. This study shows that the combination of LDA with word vectorization with the Word2vec model presents a coherence value of 0.56 and the term frequency-inverse document frequency (TF-IDF) of 0.57. However, the application of Word2vec to NMF gave better results than TF-IDF. TF-IDF was only able to achieve a coherence value of 0.49 while Word2vec got 0.52. Furthermore, at NMF, the word2vec model can recognize words in the form of locations successfully. When the traffic news has been separated, we applied Named Entity Recognition (NER) to detect the incident's location. We've labeled the location of 30% of the tweet data that has been grouped into training data. This method has successfully detected the location when tested on some other data.

求助该文献

Identification of Traffic Information on Twitter Data using Topic Modeling and Entity Recognition

今日热心研友