过采样
计算机科学
歪斜
插值(计算机图形学)
采样(信号处理)
合成数据
集合(抽象数据类型)
人工智能
班级(哲学)
支持向量机
样品(材料)
数据挖掘
机器学习
模式识别(心理学)
算法
运动(物理)
滤波器(信号处理)
电信
化学
色谱法
计算机视觉
程序设计语言
计算机网络
带宽(计算)
作者
Hongrui Li,Shuangxin Wang,Jiading Jiang,Chuiyi Deng,Junmei Ou,Ziang Zhou,Dingli Yu
出处
期刊:Neurocomputing
[Elsevier BV]
日期:2024-03-24
卷期号:583: 127600-127600
被引量:1
标识
DOI:10.1016/j.neucom.2024.127600
摘要
The problem of class imbalance is prevalent in many real-world data sets, causing learning models to skew towards the majority class and resulting in biased performance. Data augmentation methods, such as the well-known Synthetic Minority Over-sampling Technique (SMOTE), are commonly employed to address class imbalance by generating synthetic samples. However, the generation mechanism of SMOTE is relatively constrained resulting in insufficient diversity in synthetic samples. To overcome this limitation, this paper expands the classical SMOTE and introduces a novel generalized version, namely Multi-vector Stochastic Exploration Oversampling (MSEO). It broadens the set of mapping synthetic samples, originally formed by the determined direction vectors and scaling vectors through the neighboring samples, to a collection obtained through mappings with random direction vectors and scaling vectors. This allows the generated samples to escape the original linear interpolation region, facilitating a more flexible exploration of the sample space. We extensively evaluated the method on various types of datasets, including artificially generated datasets, multi-class real-world datasets, and the engineering dataset. The results indicate that MSEO exhibits significant advantages in enhancing classification performance and promoting diversity in synthetic samples.
科研通智能强力驱动
Strongly Powered by AbleSci AI