计算机科学
特征选择
数据挖掘
特征(语言学)
数据流
条件独立性
独立性(概率论)
马尔可夫链
选择(遗传算法)
机器学习
概念漂移
流式数据
人工智能
可靠性(半导体)
边界(拓扑)
数据流挖掘
数学
数学分析
哲学
物理
统计
功率(物理)
语言学
电信
量子力学
作者
Xingyu Wu,Bingbing Jiang,Xiangyu Wang,Taiyu Ban,Huanhuan Chen
标识
DOI:10.1109/tnnls.2023.3249767
摘要
Recent years have witnessed the proliferation of techniques for streaming data mining to meet the demands of many real-time systems, where high-dimensional streaming data are generated at high speed, increasing the burden on both hardware and software. Some feature selection algorithms for streaming data are proposed to tackle this issue. However, these algorithms do not consider the distribution shift due to nonstationary scenarios, leading to performance degradation when the underlying distribution changes in the data stream. To solve this problem, this article investigates feature selection in streaming data through incremental Markov boundary (MB) learning and proposes a novel algorithm. Different from existing algorithms focusing on prediction performance on off-line data, the MB is learned by analyzing conditional dependence/independence in data, which uncovers the underlying mechanism and is naturally more robust against the distribution shift. To learn MB in the data stream, the proposal transforms the learned information in previous data blocks to prior knowledge and employs them to assist MB discovery in current data blocks, where the likelihood of distribution shift and reliability of conditional independence test are monitored to avoid the negative impact from invalid prior information. Extensive experiments on synthetic and real-world datasets demonstrate the superiority of the proposed algorithm.
科研通智能强力驱动
Strongly Powered by AbleSci AI