计算机科学
词汇
推荐系统
人工神经网络
人工智能
采样(信号处理)
取样偏差
数据清理
机器学习
数据挖掘
情报检索
数据质量
样本量测定
统计
公制(单位)
哲学
语言学
数学
运营管理
滤波器(信号处理)
经济
计算机视觉
作者
Xinyang Yi,Ji Yang,Lichan Hong,Derek Zhiyuan Cheng,Lukasz Heldt,Aditee Kumthekar,Zhe Zhao,Wei Li,Ed H.
出处
期刊:Conference on Recommender Systems
日期:2019-09-10
卷期号:: 269-277
被引量:158
标识
DOI:10.1145/3298689.3346996
摘要
Many recommendation systems retrieve and score items from a very large corpus. A common recipe to handle data sparsity and power-law item distribution is to learn item representations from its content features. Apart from many content-aware systems based on matrix factorization, we consider a modeling framework using two-tower neural net, with one of the towers (item tower) encoding a wide variety of item content features. A general recipe of training such two-tower models is to optimize loss functions calculated from in-batch negatives, which are items sampled from a random mini-batch. However, in-batch loss is subject to sampling biases, potentially hurting model performance, particularly in the case of highly skewed distribution. In this paper, we present a novel algorithm for estimating item frequency from streaming data. Through theoretical analysis and simulation, we show that the proposed algorithm can work without requiring fixed item vocabulary, and is capable of producing unbiased estimation and being adaptive to item distribution change. We then apply the sampling-bias-corrected modeling approach to build a large scale neural retrieval system for YouTube recommendations. The system is deployed to retrieve personalized suggestions from a corpus with tens of millions of videos. We demonstrate the effectiveness of sampling-bias correction through offline experiments on two real-world datasets. We also conduct live A/B testings to show that the neural retrieval system leads to improved recommendation quality for YouTube.
科研通智能强力驱动
Strongly Powered by AbleSci AI