计算机科学
支持向量机
人工智能
朴素贝叶斯分类器
页眉
随机森林
机器学习
可读性
文字袋模型
分类器(UML)
自然语言处理
计算机网络
程序设计语言
作者
Rushdi Shams,Robert E. Mercer
标识
DOI:10.1109/icdm.2013.131
摘要
Supervised machine learning methods for classifying spam emails are long-established. Most of these methods use either header-based or content-based features. Spammers, however, can bypass these methods easily-especially the ones that deal with header features. In this paper, we report a novel spam classification method that uses features based on email content-language and readability combined with the previously used content-based task features. The features are extracted from four benchmark datasets viz. CSDMC2010, Spam Assassin, Ling Spam, and Enron-Spam. We use five well-known algorithms to induce our spam classifiers: Random Forest (RF), BAGGING, ADABOOSTM1, Support Vector Machine (SVM), and Naïve Bayes (NB). We evaluate the classifier performances and find that BAGGING performs the best. Moreover, its performance surpasses that of a number of state-of-the-art methods proposed in previous studies. Although applied only to English language emails, the results indicate that our method may be an excellent means to classify spam emails in other languages, as well.
科研通智能强力驱动
Strongly Powered by AbleSci AI