计算机科学
推荐系统
人气
马尔可夫决策过程
强化学习
稳健性(进化)
期限(时间)
机器学习
马尔可夫过程
熵(时间箭头)
最大熵原理
人工智能
基因
化学
物理
统计
社会心理学
量子力学
生物化学
数学
心理学
作者
Xiaoyu Shi,Quanliang Liu,Hong Xie,Yan Bai,Mingsheng Shang
出处
期刊:IEEE Transactions on Services Computing
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:: 1-14
标识
DOI:10.1109/tsc.2024.3349636
摘要
This article considers the problem of maintaining the long-term fairness of item exposure in interactive recommender systems under the dynamic setting that user preference and item popularity evolve over time. The challenge is that the evolving dynamics of user preference and item popularity in the feedback loop amplify the long-term “unfairness” of item exposure. To address this challenge, we first formulate a constrained Markov Decision Process (MDP) to capture the evolving dynamics of user preference. The proposed constrained MDP imposes long-term fairness requirements via maximum entropy techniques. Moreover, to illuminate the “unfairness” amplifying effect caused by the evolving dynamic of item popularity in the feedback loop, we design a debiased reward function to eliminate popularity bias in the training data. To this end, the proposed framework can maintain acceptable recommendation accuracy while exposing items as randomly as possible, ensuring long-term benefits for users. To address the data sparsity issue, the proposed framework can easily integrate self-supervised learning methods to enhance state representation. Experiments on three datasets and an authentic Reinforcement Learning environment (Virtual-Taobao) demonstrate the effectiveness and superiority of the proposed framework in terms of recommendation accuracy and fairness, and show the robustness against data sparsity and noise.
科研通智能强力驱动
Strongly Powered by AbleSci AI