计算机科学
有向无环图
贝叶斯网络
概率逻辑
标记数据
编码
机器学习
人工智能
可扩展性
贝叶斯概率
数据挖掘
分类器(UML)
算法
生物化学
化学
数据库
基因
作者
Lei Guo,Limin Wang,Qilong Li,Kuo Li
出处
期刊:IEEE Transactions on Big Data
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:: 1-14
标识
DOI:10.1109/tbdata.2023.3338019
摘要
How to train learners over unbalanced data with asymmetric costs has been recognized as one of the most significant challenges in data mining. Bayesian network classifier (BNC) provides a powerful probabilistic tool to encode the probabilistic dependencies among random variables in directed acyclic graph (DAG), whereas unbalanced data will result in unbalanced network topology. This will lead to a biased estimate of the conditional or joint probability distribution, and finally a reduction in the classification accuracy. To address this issue, we propose to redefine the information-theoretic metrics to uniformly represent the balanced dependencies between attributes or that between attribute values. Then heuristic search strategy and thresholding operation are introduced to respectively learn refined DAGs from labeled and unlabeled data. The experimental results on 32 benchmark datasets reveal that the proposed highly scalable algorithm is competitive with or superior to a number of state-of-the-art single and ensemble learners.
科研通智能强力驱动
Strongly Powered by AbleSci AI