清晨好,您是今天最早来到科研通的研友!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您科研之路漫漫前行!

A Study on the Impact of Data Characteristics in Imbalanced Regression Tasks

计算机科学 机器学习 回归 人工智能 回归分析 统计 数学
作者
Paula Branco,Luı́s Torgo
出处
期刊:IEEE International Conference on Data Science and Advanced Analytics 卷期号:: 193-202 被引量:4
标识
DOI:10.1109/dsaa.2019.00034
摘要

The class imbalance problem has been thoroughly studied over the past two decades. More recently, the research community realized that the problem of imbalanced distributions also occurred in other tasks beyond classification. Regression problems are among these newly studied tasks where the problem of imbalanced domains also poses important challenges. Imbalanced regression problems occur in a diversity of real world domains such as meteorological (predicting weather extreme values), financial (extreme stock returns forecasting) or medical (anticipate rare values). In imbalanced regression the end-user preferences are biased towards values of the target variable that are under-represented on the available data. Several pre-processing methods were proposed to address this problem. These methods change the training set to force the learner to focus on the rare cases. However, as far as we know, the relationship between the data intrinsic characteristics and the performance achieved by these methods has not yet been studied for imbalanced regression tasks. In this paper we describe a study of the impact certain data characteristics may have in the results of applying pre-processing methods to imbalanced regression problems. To achieve this goal, we define potentially interesting data characteristics of regression problems. We then conduct our study using a synthetic data repository build for this purpose. We show that all the different characteristics studied have a different behaviour that is related with the level at which the data characteristic is present and the learning algorithm used. The main contributions of our work are: i) to define interesting data characteristics for regression tasks; ii) to create the first repository of imbalanced regression tasks containing 6000 data sets with controlled data characteristics; and iii) to provide insights on the impact of intrinsic data characteristics in the results of pre-processing methods for handling imbalanced regression tasks.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
NexusExplorer应助科研通管家采纳,获得10
4秒前
量子星尘发布了新的文献求助10
10秒前
12秒前
胡国伦完成签到 ,获得积分10
25秒前
hqh完成签到,获得积分10
47秒前
好好好完成签到 ,获得积分10
1分钟前
sage_kakarotto完成签到 ,获得积分10
1分钟前
1分钟前
制药人完成签到 ,获得积分10
1分钟前
zoes完成签到 ,获得积分10
2分钟前
2分钟前
2分钟前
倚楼听风雨完成签到 ,获得积分10
2分钟前
七叶花开完成签到 ,获得积分10
3分钟前
沉沉完成签到 ,获得积分0
3分钟前
3分钟前
欣欣完成签到 ,获得积分10
3分钟前
木卫二完成签到 ,获得积分10
3分钟前
上下完成签到 ,获得积分10
3分钟前
3分钟前
Owen应助安蓝采纳,获得10
3分钟前
123456完成签到 ,获得积分10
3分钟前
3分钟前
安蓝发布了新的文献求助10
3分钟前
3分钟前
圆了个甜发布了新的文献求助10
3分钟前
科研通AI2S应助科研通管家采纳,获得10
4分钟前
科研通AI2S应助科研通管家采纳,获得10
4分钟前
4分钟前
顾矜应助圆了个甜采纳,获得10
4分钟前
4分钟前
归尘发布了新的文献求助10
4分钟前
4分钟前
归尘发布了新的文献求助10
4分钟前
归尘发布了新的文献求助10
4分钟前
归尘发布了新的文献求助10
4分钟前
归尘完成签到,获得积分10
4分钟前
shhoing应助ARK采纳,获得10
4分钟前
wuqs完成签到,获得积分10
4分钟前
桃子爱学习完成签到,获得积分10
4分钟前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Nonlinear Problems of Elasticity 3000
List of 1,091 Public Pension Profiles by Region 1581
Encyclopedia of Agriculture and Food Systems Third Edition 1500
Minimizing the Effects of Phase Quantization Errors in an Electronically Scanned Array 1000
Specialist Periodical Reports - Organometallic Chemistry Organometallic Chemistry: Volume 46 1000
Current Trends in Drug Discovery, Development and Delivery (CTD4-2022) 800
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 遗传学 催化作用 冶金 量子力学 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 5534602
求助须知:如何正确求助?哪些是违规求助? 4622598
关于积分的说明 14582691
捐赠科研通 4562782
什么是DOI,文献DOI怎么找? 2500381
邀请新用户注册赠送积分活动 1479882
关于科研通互助平台的介绍 1451113