计算机科学
机器学习
人工智能
实证研究
钥匙(锁)
面子(社会学概念)
控制(管理)
回归分析
回归
统计模型
随机森林
数据建模
在线机器学习
算法学习理论
统计学习
线性回归
变量
基于实例的学习
方案(数学)
计算学习理论
作者
Bowen Shi,Xiaojie Mao,Mochen Yang,Bo Li
标识
DOI:10.1287/isre.2024.0888
摘要
We provide an introduction to double/debiased machine learning (DML), a framework that enables effect estimation when dealing with complex, high-dimensional data. In many empirical analyses, especially in fields such as information systems, researchers face difficult choices about which control variables to include and how to model their relationships with the outcome. These modeling decisions can significantly change results, leading to uncertainty about which findings are reliable. DML offers a practical solution by combining modern machine learning with rigorous statistical inference. The idea is to let flexible ML models (such as random forests or gradient boosting) capture complex relationships among control variables while still delivering reliable estimates for the key effect of interest. DML can be applied to many familiar research designs, including standard regression with controls, instrumental variables, difference in differences, and models that incorporate ML-generated features. Empirical studies and simulations show that DML is typically more robust to misspecification than traditional regression and more reliable than earlier semiparametric methods. However, DML is not automatic—it still requires sound research design and high-quality machine learning estimation. Used thoughtfully, DML provides a powerful, flexible, and statistically grounded approach for empirical research in modern data environments.
科研通智能强力驱动
Strongly Powered by AbleSci AI