计算机科学
机器学习
人工智能
不完美的
可信赖性
推论
统计推断
简单(哲学)
因果推理
数据挖掘
统计模型
稳健优化
稳健性(进化)
信号(编程语言)
噪音(视频)
统计学习
在线机器学习
决策树
统计假设检验
离群值
统计学习理论
作者
Aaron Schecter,Weifeng Li
标识
DOI:10.1287/isre.2023.0340
摘要
Organizations increasingly use machine learning to turn text, images, and other unstructured data into variables that inform decisions and research. But, because machine learning predictions are never perfect, the resulting data can contain errors that quietly distort statistical analyses, sometimes leading to incorrect conclusions about what truly drives important outcomes. This study introduces a robust optimization approach that helps analysts and decision makers draw more reliable insights when working with machine learning–generated data. The method is designed to strengthen the signal of real effects, reducing the influence of noisy or imperfect predictions, resulting in more trustworthy hypothesis tests and fewer missed or misleading findings. The approach also includes a simple correction step that uses a small amount of high-quality labeled data—such as a subset of manually reviewed cases—to further improve accuracy. Across simulations and a real-world example using Amazon reviews, the method consistently delivers more dependable results than common alternatives. For professionals who rely on machine learning in areas such as marketing, operations, public policy, or risk management, this framework offers a practical, transparent way to ensure that conclusions remain sound even when data sources are imperfect.
科研通智能强力驱动
Strongly Powered by AbleSci AI