A unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research

计算机科学 班级(哲学) 数据科学 展开图 鉴定(生物学) 钥匙(锁) 人工智能 分类学(生物学) 机器学习 计算机安全 生物 植物
作者
Miriam Seoane Santos,Pedro Henriques Abreu,Nathalie Japkowicz,Alberto Fernández,João Santos
出处
期刊:Information Fusion [Elsevier BV]
卷期号:89: 228-253 被引量:53
标识
DOI:10.1016/j.inffus.2022.08.017
摘要

The combination of class imbalance and overlap is currently one of the most challenging issues in machine learning. While seminal work focused on establishing class overlap as a complicating factor for classification tasks in imbalanced domains, ongoing research mostly concerns the study of their synergy over real-word applications. However, given the lack of a well-formulated definition and measurement of class overlap in real-world domains, especially in the presence of class imbalance, the research community has not yet reached a consensus on the characterisation of both problems. This naturally complicates the evaluation of existing approaches to address these issues simultaneously and prevents future research from moving towards the devise of specialised solutions. In this work, we advocate for a unified view of the problem of class overlap in imbalanced domains. Acknowledging class overlap as the overarching problem – since it has proven to be more harmful for classification tasks than class imbalance – we start by discussing the key concepts associated to its definition, identification, and measurement in real-world domains, while advocating for a characterisation of the problem that attends to multiple sources of complexity. We then provide an overview of existing data complexity measures and establish the link to what specific types of class overlap problems these measures cover, proposing a novel taxonomy of class overlap complexity measures. Additionally, we characterise the relationship between measures, the insights they provide, and discuss to what extent they account for class imbalance. Finally, we systematise the current body of knowledge on the topic across several branches of Machine Learning (Data Analysis, Data Preprocessing, Algorithm Design, and Meta-learning), identifying existing limitations and discussing possible lines for future research.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
kmssh完成签到,获得积分10
1秒前
sss发布了新的文献求助10
1秒前
1秒前
zj完成签到,获得积分10
2秒前
Ava应助徐执默采纳,获得10
3秒前
3秒前
wenki完成签到,获得积分10
3秒前
欣喜沛芹完成签到,获得积分10
3秒前
lemon完成签到 ,获得积分10
4秒前
7890733完成签到,获得积分10
4秒前
4秒前
4秒前
kmssh发布了新的文献求助10
5秒前
5秒前
八荒来犬发布了新的文献求助10
5秒前
6秒前
胡军威完成签到,获得积分10
7秒前
8秒前
Foalphaz发布了新的文献求助10
8秒前
8秒前
小巧紊发布了新的文献求助10
8秒前
所所应助任斯采纳,获得10
8秒前
wwwteng呀发布了新的文献求助10
8秒前
寒星苍梧完成签到,获得积分10
9秒前
9秒前
在水一方应助doku采纳,获得10
9秒前
烟花应助hopen采纳,获得10
9秒前
姜sir发布了新的文献求助10
9秒前
10秒前
cbj完成签到,获得积分10
10秒前
充电宝应助西伯侯采纳,获得10
11秒前
Lz发布了新的文献求助10
11秒前
tracy完成签到,获得积分10
11秒前
zjzjzjzjzj完成签到 ,获得积分10
12秒前
精明人雄完成签到,获得积分10
12秒前
13秒前
13秒前
13秒前
13秒前
高分求助中
(禁止应助)【重要!!请各位详细阅读】【科研通的精品贴汇总】 10000
Semantics for Latin: An Introduction 1099
Biology of the Indian Stingless Bee: Tetragonula iridipennis Smith 1000
Robot-supported joining of reinforcement textiles with one-sided sewing heads 740
镇江南郊八公洞林区鸟类生态位研究 500
Thermal Quadrupoles: Solving the Heat Equation through Integral Transforms 500
Corpus Linguistics for Language Learning Research 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 4139320
求助须知:如何正确求助?哪些是违规求助? 3676275
关于积分的说明 11620352
捐赠科研通 3370382
什么是DOI,文献DOI怎么找? 1851340
邀请新用户注册赠送积分活动 914489
科研通“疑难数据库(出版商)”最低求助积分说明 829266