摘要
AbstractRandom forest (RF) methodology is a nonparametric methodology for prediction problems. A standard way to use RFs includes generating a global RF to predict all test cases of interest. In this article, we propose growing different RFs specific to different test cases, namely case-specific random forests (CSRFs). In contrast to the bagging procedure in the building of standard RFs, the CSRF algorithm takes weighted bootstrap resamples to create individual trees, where we assign large weights to the training cases in close proximity to the test case of interest a priori. Tuning methods are discussed to avoid overfitting issues. Both simulation and real data examples show that the weighted bootstrap resampling used in CSRF construction can improve predictions for specific cases. We also propose a new case-specific variable importance (CSVI) measure as a way to compare the relative predictor variable importance for predicting a particular case. It is possible that the idea of building a predictor case-specifically can be generalized in other areas.Key Words: Machine learningPredictionVariable importance ACKNOWLEDGMENTThis work was supported by National Science Foundation (NSF) Plant Genome Award 0922746 and by NSF DMS-1406747.Additional informationNotes on contributorsRuo XuRuo Xu is Analyst, Google Inc., 1600 Amphitheatre, Mountain View, CA 94043 (E-mail: xuruo.isu@gmail.com). Dan Nettleton is Professor, Department of Statistics, Iowa State University, Ames, IA 50011 (E-mail: dnett@iastate.edu). Daniel J. Nordman is Associate Professor, Department of Statistics, Iowa State University, Ames, IA 50011 (E-mail: dnordman@iastate.edu).Dan NettletonRuo Xu is Analyst, Google Inc., 1600 Amphitheatre, Mountain View, CA 94043 (E-mail: xuruo.isu@gmail.com). Dan Nettleton is Professor, Department of Statistics, Iowa State University, Ames, IA 50011 (E-mail: dnett@iastate.edu). Daniel J. Nordman is Associate Professor, Department of Statistics, Iowa State University, Ames, IA 50011 (E-mail: dnordman@iastate.edu).Daniel J. NordmanRuo Xu is Analyst, Google Inc., 1600 Amphitheatre, Mountain View, CA 94043 (E-mail: xuruo.isu@gmail.com). Dan Nettleton is Professor, Department of Statistics, Iowa State University, Ames, IA 50011 (E-mail: dnett@iastate.edu). Daniel J. Nordman is Associate Professor, Department of Statistics, Iowa State University, Ames, IA 50011 (E-mail: dnordman@iastate.edu).