逻辑回归
罕见事件
计算机科学
统计
回归
计量经济学
数学
机器学习
作者
Xuetong Li,Xuening Zhu,Hansheng Wang
标识
DOI:10.5705/ss.202022.0242
摘要
Large-scale rare events data are commonly encountered in practice.To tackle the massive rare events data, we propose a novel distributed estimation method for logistic regression in a distributed system.For a distributed framework, we face the following two challenges.The first challenge is how to distribute the data.In this regard, two different distribution strategies (i.e., the RANDOM strategy and the COPY strategy) are investigated.The second challenge is how to select an appropriate type of objective function so that the best asymptotic efficiency can be achieved.Then, the under-sampled (US) and inverse probability weighted (IPW) types of objective functions are considered.Our results suggest that the COPY strategy together with the IPW objective function is the best solution for distributed logistic regression with rare events.The finite sample performance of the distributed methods is demonstrated by simulation studies and a real-world Sweden Traffic Sign dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI