Predicting with Proxies: Transfer Learning in High Dimension

代理(统计) 预测分析 计算机科学 启发式 估计员 大数据 分析 人口 数据挖掘 数据科学 机器学习 计量经济学 统计 数学 社会学 人口学 操作系统
作者
Hamsa Bastani
出处
期刊:Management Science [Institute for Operations Research and the Management Sciences]
卷期号:67 (5): 2964-2984 被引量:90
标识
DOI:10.1287/mnsc.2020.3729
摘要

Predictive analytics is increasingly used to guide decision making in many applications. However, in practice, we often have limited data on the true predictive task of interest and must instead rely on more abundant data on a closely related proxy predictive task. For example, e-commerce platforms use abundant customer click data (proxy) to make product recommendations rather than the relatively sparse customer purchase data (true outcome of interest); alternatively, hospitals often rely on medical risk scores trained on a different patient population (proxy) rather than their own patient population (true cohort of interest) to assign interventions. Yet, not accounting for the bias in the proxy can lead to suboptimal decisions. Using real data sets, we find that this bias can often be captured by a sparse function of the features. Thus, we propose a novel two-step estimator that uses techniques from high-dimensional statistics to efficiently combine a large amount of proxy data and a small amount of true data. We prove upper bounds on the error of our proposed estimator and lower bounds on several heuristics used by data scientists; in particular, our proposed estimator can achieve the same accuracy with exponentially less true data (in the number of features d). Finally, we demonstrate the effectiveness of our approach on e-commerce and healthcare data sets; in both cases, we achieve significantly better predictive accuracy as well as managerial insights into the nature of the bias in the proxy data. This paper was accepted by George Shanthikumar, big data and analytics.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
张文乐发布了新的文献求助10
刚刚
叶子完成签到 ,获得积分10
1秒前
研友_VZG7GZ应助呼呼啦啦采纳,获得10
2秒前
2秒前
瘦瘦瘦完成签到,获得积分10
3秒前
3秒前
Janely完成签到,获得积分10
4秒前
4秒前
5秒前
霁星河完成签到,获得积分10
6秒前
7秒前
wallonce发布了新的文献求助10
7秒前
7秒前
王欣悦发布了新的文献求助10
8秒前
9秒前
beili发布了新的文献求助10
10秒前
king19861119完成签到,获得积分10
10秒前
Roach完成签到,获得积分10
11秒前
科研通AI6.1应助陈伊森采纳,获得10
12秒前
Jarvis发布了新的文献求助10
12秒前
king19861119发布了新的文献求助10
13秒前
梦想启航应助xh采纳,获得10
13秒前
小马甲应助1222采纳,获得10
14秒前
14秒前
wallonce完成签到,获得积分10
14秒前
15秒前
小蘑菇应助DJ采纳,获得10
15秒前
16秒前
李健应助闻雁惊采纳,获得10
16秒前
派大星完成签到,获得积分10
17秒前
顾矜应助zhangmuming采纳,获得10
17秒前
19秒前
luckybei发布了新的文献求助10
19秒前
ryeong发布了新的文献求助10
20秒前
栖浔发布了新的文献求助10
20秒前
20秒前
Vanff发布了新的文献求助10
21秒前
zlxxxx完成签到,获得积分10
21秒前
西兰花完成签到,获得积分10
22秒前
22秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Developing Genetic Editing Tools for Lysobacter 2000
卤化钙钛矿人工突触的研究 2000
Моделирование процессов самоорганизации в кристаллообразующих системах 1000
History of U.S. Space Surveillance and Satellite Cataloging 1000
Signals, Systems, and Signal Processing 610
Fundamentals of Pharmaceutical and Biologics Regulations: A Global Perspective, Second Edition 600
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6516348
求助须知:如何正确求助?哪些是违规求助? 8309351
关于积分的说明 17761032
捐赠科研通 5618625
什么是DOI,文献DOI怎么找? 2925431
邀请新用户注册赠送积分活动 1902456
关于科研通互助平台的介绍 1763582