二元曲线
计算机科学
支持向量机
词(群论)
人工智能
朴素贝叶斯分类器
简单(哲学)
代码段
任务(项目管理)
机器学习
情绪分析
特征(语言学)
匹配(统计)
模式识别(心理学)
自然语言处理
数据挖掘
情报检索
数学
统计
哲学
语言学
几何学
三元曲线
管理
认识论
经济
作者
Sida Wang,Christopher D. Manning
摘要
Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the opposite result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets. Based on these observations, we identify simple NB and SVM variants which outperform most published results on sentiment analysis datasets, sometimes providing a new state-of-the-art performance level.
科研通智能强力驱动
Strongly Powered by AbleSci AI