A unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research

计算机科学班级（哲学）数据科学展开图鉴定（生物学）钥匙（锁）人工智能分类学（生物学）机器学习计算机安全生物植物

作者

Miriam Seoane Santos,Pedro Henriques Abreu,Nathalie Japkowicz,Alberto Fernández,João Santos

出处

期刊：Information Fusion [Elsevier BV]
日期：2022-08-20 卷期号：89: 228-253 被引量：53

标识

DOI：10.1016/j.inffus.2022.08.017

摘要

The combination of class imbalance and overlap is currently one of the most challenging issues in machine learning. While seminal work focused on establishing class overlap as a complicating factor for classification tasks in imbalanced domains, ongoing research mostly concerns the study of their synergy over real-word applications. However, given the lack of a well-formulated definition and measurement of class overlap in real-world domains, especially in the presence of class imbalance, the research community has not yet reached a consensus on the characterisation of both problems. This naturally complicates the evaluation of existing approaches to address these issues simultaneously and prevents future research from moving towards the devise of specialised solutions. In this work, we advocate for a unified view of the problem of class overlap in imbalanced domains. Acknowledging class overlap as the overarching problem – since it has proven to be more harmful for classification tasks than class imbalance – we start by discussing the key concepts associated to its definition, identification, and measurement in real-world domains, while advocating for a characterisation of the problem that attends to multiple sources of complexity. We then provide an overview of existing data complexity measures and establish the link to what specific types of class overlap problems these measures cover, proposing a novel taxonomy of class overlap complexity measures. Additionally, we characterise the relationship between measures, the insights they provide, and discuss to what extent they account for class imbalance. Finally, we systematise the current body of knowledge on the topic across several branches of Machine Learning (Data Analysis, Data Preprocessing, Algorithm Design, and Meta-learning), identifying existing limitations and discussing possible lines for future research.

求助该文献

最长约 10秒，即可获得该文献文件

A unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research

今日热心研友