计算机科学
降维
人工智能
机器学习
朴素贝叶斯分类器
情绪分析
水准点(测量)
随机森林
主成分分析
决策树
特征(语言学)
F1得分
数据挖掘
支持向量机
语言学
哲学
大地测量学
地理
作者
N. Dhamayanthi,B. Lavanya
标识
DOI:10.14569/ijacsa.2024.0150678
摘要
Sentiment analysis is vital for understanding public opinion, but improving its performance is challenging due to the complexities of high-dimensional text data and diverse user-generated content. We propose a novel framework based on Dimensionality Reduction for Machine Learning (DRML) that enhances the classification performance by 21.55% while reducing the dimension of the feature matrix by 99.63%. Our research addresses the fundamental question of whether it is possible to reduce the feature space significantly while improving sentiment analysis performance. Our approach employs Principal Component Analysis (PCA) to effectively capture essential textual features and includes the development of an algorithm for identifying principal components from positive and negative reviews. We then create a supervised dataset by combining these components. Furthermore, we integrate a range of state-of-the-art machine learning algorithms (Decision Tree, K-Nearest Neighbours, Bernoulli Naïve Bayes, and Majority Voting Ensemble) into our framework, along with a custom tokenizer, to harness the full potential of reduced-dimensional data for sentiment classification. We have conducted extensive experiments using gold standard multi-domain benchmark datasets from Amazon to show that DRML outperforms other state-of-the-art approaches. Our proposed methodology gives superior performance with an average performance of 98.38% which is a significant increase in performance by 21.55% compared to the baseline methodology using Bag of Words (BoW). In terms of individual evaluation parameters, DRML shows an increase of 21.84% in Accuracy, 20.4% in Precision, 21.84% in Recall, and 22.11% in F1-score. In comparison with the state-of-the-art (SOTA) methodologies applied to the same benchmark dataset in recent years, our framework demonstrates a significant average increase in Accuracy for Sentiment Analysis by 10.96%. This substantial improvement underscores the effectiveness of our approach. To conclude, our research contributes to the field of sentiment analysis by introducing an innovative framework that not only improves the efficiency of sentiment analysis but also paves the way for the analysis of extensive textual data in diverse real-world applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI