计算机科学
决策树
机器学习
再培训
参数统计
人工智能
树(集合论)
培训(气象学)
样品(材料)
随机森林
交替决策树
计算复杂性理论
算法
决策树学习
数学
增量决策树
统计
气象学
国际贸易
业务
数学分析
化学
物理
色谱法
作者
Boris Sharchilev,Yury Ustinovsky,Pavel Serdyukov,Maarten de Rijke
出处
期刊:Cornell University - arXiv
日期:2018-01-01
被引量:18
标识
DOI:10.48550/arxiv.1802.06640
摘要
We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model's predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric models, this analysis can be conducted in a computationally efficient way. We propose several ways of extending this framework to non-parametric GBDT ensembles under the assumption that tree structures remain fixed. Furthermore, we introduce a general scheme of obtaining further approximations to our method that balance the trade-off between performance and computational complexity. We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our approach to finding influential training samples in comparison to the baselines and its computational efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI