计算机科学
聚类分析
采样(信号处理)
一致性(知识库)
数据挖掘
推论
人工智能
机器学习
序列(生物学)
推荐系统
噪音(视频)
任务(项目管理)
相关性(法律)
工程类
政治学
法学
系统工程
图像(数学)
滤波器(信号处理)
生物
遗传学
计算机视觉
作者
Yuren Zhang,Enhong Chen,Binbin Jin,Hao Wang,Min Hou,Wei Huang,Runlong Yu
标识
DOI:10.1145/3477495.3531829
摘要
Click-through rate (CTR) prediction is fundamental in many industrial applications, such as online advertising and recommender systems. With the development of the online platforms, the sequential user behaviors grow rapidly, bringing us great opportunity to better understand user preferences.However, it is extremely challenging for existing sequential models to effectively utilize the entire behavior history of each user. First, there is a lot of noise in such long histories, which can seriously hurt the prediction performance. Second, feeding the long behavior sequence directly results in infeasible inference time and storage cost. In order to tackle these challenges, in this paper we propose a novel framework, which we name as User Behavior Clustering Sampling (UBCS). In UBCS, short sub-sequences will be obtained from the whole user history sequence with two cascaded modules: (i) Behavior Sampling module samples short sequences related to candidate items using a novel sampling method which takes relevance and temporal information into consideration; (ii) Item Clustering module clusters items into a small number of cluster centroids, mitigating the impact of noise and improving efficiency. Then, the sampled short sub-sequences will be fed into the CTR prediction module for efficient prediction. Moreover, we conduct a self-supervised consistency pre-training task to extract user persona preference and optimize the sampling module effectively. Experiments on real-world datasets demonstrate the superiority and efficiency of our proposed framework.
科研通智能强力驱动
Strongly Powered by AbleSci AI