计算机科学
自然语言处理
人工智能
词(群论)
嵌入
背景(考古学)
文字嵌入
基本事实
情报检索
班级(哲学)
训练集
语言学
生物
哲学
古生物学
作者
Diana Purwitasari,Ana Alimatus Zaqiyah,Chastine Fatichah
标识
DOI:10.1109/icacsis53237.2021.9631315
摘要
Generating minor class data of spam texts is expected to solve the imbalanced problem in spam detection of product reviews. There could be semantic differences between the generated texts and the original ones. Thus, by including the semantically differed texts in the spam dataset used for training is like a noise addition. For evaluating the generated texts, some manual preparations of ground-truth data are necessary. This work has evaluated the generated texts with some approaches to ensure their context and sequence similarities compared to the original texts for better performance of a spam detection. The employed approaches are expected to eliminate the manual tasks. Our research proposes an evaluation model that consists of word-embedding pre-trained and LSTM Siamese to evaluate text generation in imbalance review. The use of a combination of pre-trained word embedding and LSTM Siamese trained model will capture the semantic aspect of the text.
科研通智能强力驱动
Strongly Powered by AbleSci AI