可读性
计算机科学
人工智能
社会化媒体
自然语言处理
造谣
分类器(UML)
卷积神经网络
语言学
情报检索
万维网
哲学
程序设计语言
摘要
Abstract The COVID-19 pandemic provided an infodemic situation to face people in the society with a massive amount of information due to accessing social media, such as Twitter and Instagram. These platforms have made the information circulation easy and paved the ground to mix information and misinformation. One solution to prevent an infodemic situation is avoiding false information distribution and filtering the fake news to reduce the negative impact of such news in the society. This article aims at studying the properties of fake news in English and Persian using the textual information transmitted through language in the news. To this end, the properties existed in a text based on information theory, stylometry information from raw texts, readability of the texts, and linguistic information, such as phonology, syntax, and morphology, are studied. In this study, we use the XLM-RoBERTa representation with a convolutional neural network classifier as the basic model to detect English and Persian COVID-19 fake news. In addition, we propose different learning scenarios such that different feature sets are concatenated with the contextualized representation. According to the experimental results, adding any of the textual information to the basic model has improved the performance of the classifier for both English and Persian. Information about readability of the texts and stylometry features have been the most effective features for detecting English fake news and improved the performance by 2.72% based on F-measure. Augmenting this feature setting with the information amount and linguistic morphological information improved the performance of the classifier by 3.79% based on F-measure for Persian.
科研通智能强力驱动
Strongly Powered by AbleSci AI