过采样
离群值
计算机科学
人工智能
F1得分
支持向量机
朴素贝叶斯分类器
数据挖掘
机器学习
精确性和召回率
班级(哲学)
模式识别(心理学)
计算机网络
带宽(计算)
作者
Hanifatul Insan,Sri Suryani Prasetiyowati,Yuliant Sibaroni
标识
DOI:10.1109/icicyta60173.2023.10428902
摘要
Dealing with data imbalance and outliers is an important challenge in data classification. The aim of this study is to improve classification performance by reducing the effects of class imbalance and the presence of outliers in the dataset. SMOTE-LOF combines the SMOTE oversampling method with the Local Outlier Factor (LOF) to create a synthetic sample that also accounts for potential outliers. Meanwhile, Borderline-SMOTE identifies "borderline" samples in the minority class and then creates synthetic samples along the border between the majority and minority classes. In this study, experiments were conducted using classification algorithms such as Naïve Bayes, and Support Vector Machine on datasets that are imbalanced and contain outliers. The datasets used in this research include Pima Indians, Haberman, Glass, and Rainfall. This research scenario includes a comparison with previous research that has been done regarding SMOTE-LOF and Borderline-SMOTE on the Rainfall dataset. The results showed that on the three datasets, Borderline-SMOTE outperformed SMOTE-LOF on all three classifiers with an average accuracy of 4-6%, precision of 2-4%, recall of 5-10%, and F1 score of 5-6%. When the technique was applied to the Rainfall dataset, the results showed a 10-25% increase in accuracy. The outcomes consistently demonstrate that, when applied to the Pima Indians, Haberman, and Glass datasets, Borderline-SMOTE improves the performance of several classification algorithms. Better accuracy, precision, recall, and F1 score are evidence of this when compared to the application of the SMOTE-LOF technique.
科研通智能强力驱动
Strongly Powered by AbleSci AI