HateCircle and Unsupervised Hate Speech Detection Incorporating Emotion and Contextual Semantics

孟加拉语印地语计算机科学情绪检测词典自然语言处理人工智能社会化媒体语义学（计算机科学）编码（集合论）语音活动检测语音识别语音处理情绪识别万维网程序设计语言集合（抽象数据类型）

作者

Sayani Ghosal,Amita Jain

出处

期刊：ACM Transactions on Asian and Low-Resource Language Information Processing 日期：2022-12-19 卷期号：22 (4): 1-28 被引量：18

标识

摘要

The explosive growth of social media has fueled an extensive increase in online freedom of speech. The worldwide platform of human voice creates possibilities to assail other users without facing any consequences, and flout social etiquettes, resulting in an inevitable increase of hate speech. Nowadays, English hate speech detection is a popular research area, but the prevalence of implicit hate content in regional languages desire effective language-independent models. The proposed research is the first unsupervised Hindi and Bengali hate content detection framework consisting of three significant concepts: HateCircle, hate tweet classification, and code-switch data preparation algorithms. The novel HateCircle method is proposed to detect hate orientation for each term by co-occurrence patterns of words, contextual semantics, and emotion analysis. The efficient multiclass hate tweet classification algorithm is proposed with parts of speech tagging, Euclidean distance, and the Geometric median methods. The detection of hate content is more efficient in the native script compared to the Roman script, so the transliteration algorithm is also proposed for code-switch data preparation. The experimentation evaluates the combination of various lexicons with our enriched hate lexicon that achieves a maximum of 0.74 F1-score for the Hindi and 0.88 F1-score for the Bengali datasets. The novel HateCircle and hate tweet detection framework evaluates with our proposed parts of speech tagging and Geometric median detection methods. Results reveal that HateCircle and hate tweet detection framework also achieves a maximum of 0.73 accuracy for the Hindi and 0.78 accuracy for the Bengali dataset. The experiment results signify that contextual semantic hate speech detection research with a language-independency feature offsets the growth of implicit abusive text in social media.

求助该文献

最长约 10秒，即可获得该文献文件

HateCircle and Unsupervised Hate Speech Detection Incorporating Emotion and Contextual Semantics

今日热心研友