房地产
业务
计算机科学
计量经济学
广告
经济
财务
出处
期刊:Social Science Research Network
[Social Science Electronic Publishing]
日期:2022-01-01
摘要
I propose a solution to content removal bias in statistics from web-scraped data. Content removal bias occurs when data is removed from the web before a scraper is able to collect it. The solution I propose is based on inverse probability weights, derived from the parameters of a survival function with complex forms of data censoring. I apply this solution to the calculation of the proportion of newly built dwellings with web-scraped data on Luxembourg. The results show that in applications like this one, with frequent scraping compared with the average permanence online of the data, the extent of content removal bias is relatively small.
科研通智能强力驱动
Strongly Powered by AbleSci AI