计算机科学
数据挖掘
树(集合论)
数据科学
计量经济学
统计
数学
数学分析
作者
Diya Bhaduri,Daniell Toth,Scott H. Holan
摘要
ABSTRACT Recent advances in data complexity and availability present both challenges and opportunities for automated data exploration. Tree‐based methods, known for their interpretability, are widely used for building regression and classification models. However, they often lag behind the best supervised learning approaches in terms of prediction accuracy. To address this limitation, ensemble methods, such as random forests, combine multiple trees to improve prediction accuracy, though at the cost of interpretability. While tree‐based methods have seen extensive use in various fields, their application in the context of complex survey data has been relatively limited. This article provides an overview of the state‐of‐the‐art tree‐based approaches for analyzing complex survey data. It distinguishes methods explicitly designed for survey contexts from those adapted from other domains. The discussion covers applications in model‐assisted approaches, disclosure limitation, and small area estimation, as well as other recent methodological developments tailored to survey data. Additionally, the article explores aggregated tree models that sacrifice interpretability for improved prediction accuracy. These models, such as Bagging, Random Forests, and Boosting, are explained, along with the concept of out‐of‐bag error for model evaluation. Finally, this article provides the history and development of tree models, from their origins in regression trees to more recent Bayesian approaches, and aggregated tree models. This overview sheds light on the potential utility of tree‐based methods in survey methodology and provides insights into future research directions in this evolving field.
科研通智能强力驱动
Strongly Powered by AbleSci AI