朴素贝叶斯分类器
计算机科学
机器学习
人工智能
Python(编程语言)
分类
文字袋模型
互联网
滤波器(信号处理)
情报检索
数据挖掘
支持向量机
万维网
计算机视觉
操作系统
作者
N. Mageshkumar,A. Vijayaraj,N. Arunpriya,A. Sangeetha
标识
DOI:10.1016/j.matpr.2022.05.364
摘要
Spam emails have long been a source of concern in the field of computer security. They are both monetarily and technologically costly, as well as extremely harmful to computers and networks. Despite the rise of social networks and other Internet-based information exchange venues, email communication has become increasingly important over time, necessitating the urgent improvement of spam filters. Although various spam filters have been developed to help prevent spam emails from reaching a user's mailbox, there has been little research into text modifications. Because of its simplicity and efficiency, Naive Bayes is currently one of the most used methods of spam classification. However, when emails contain leetspeak or diacritics, Naive Bayes is unable to correctly categorize them. As a result, we created a novel method to improve the accuracy of the Naive Bayes Spam Filter to detect text alterations and correctly classify emails as Spam or ham in this proposal. When compared to Spamassassin, our Python approach uses a combination of semantic, keyword, and machine learning algorithms to improve Naive Bayes accuracy. Furthermore, we identified a link between email length and spam score, indicating that Bayesian Poisoning, a contentious concept, is an actual occurrence used by spammers.
科研通智能强力驱动
Strongly Powered by AbleSci AI