球(数学)
公制(单位)
计算机科学
度量(数据仓库)
数据挖掘
人工智能
特征选择
算法
合成数据
数学
希尔伯特空间
先验与后验
度量空间
实验数据
理论计算机科学
数据处理
噪声数据
数据缩减
公制
特征(语言学)
模式识别(心理学)
机器学习
树(集合论)
作者
Che, Menglu,Li, Ting,Pan, Wenliang,Wang, Xueqin,Zhang, Heping
出处
期刊:La Trobe University - OPAL (Open@LaTrobe)
日期:2025-01-01
标识
DOI:10.6084/m9.figshare.30850570.v1
摘要
Data in various domains, such as neuroimaging and network data analysis, often come in complex forms without possessing a Hilbert structure. The complexity necessitates innovative approaches for effective analysis. We propose a novel measure of heterogeneity, ball impurity, which is designed to work with complex non-Euclidean objects. Our approach extends the notion of impurity to general metric spaces, providing a versatile tool for feature selection and tree models. The ball impurity measure exhibits desirable properties, such as the triangular inequality, and is computationally tractable, enhancing its practicality and usefulness. Extensive experiments on synthetic data and real data from the UK Biobank validate the efficacy of our approach in capturing data heterogeneity. Remarkably, our results compare favorably with state-of-the-art methods in metric spaces, highlighting the potential of ball impurity as a valuable tool for addressing complex data analysis tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI