Optimizing and Learning Assortment Decisions in the Presence of Platform Disengagement
脱离理论
业务
计算机科学
医学
老年学
作者
Mika Sumida,Angela Zhou
出处
期刊:Social Science Research Network [Social Science Electronic Publishing] 日期:2023-01-01
标识
DOI:10.2139/ssrn.4537925
摘要
Problem definition: We consider a problem where customers repeatedly interact with a platform. During each interaction with the platform, the customer is shown an assortment of items and selects among these items according to a Multinomial Logit choice model. The probability that a customer interacts with the platform in the next period depends on the customer’s past purchase history. The goal of the platform is to maximize the total revenue obtained from each customer over a finite time horizon.Methodology/results: First, we study a non-learning version of the problem where consumer preferences and return probabilities are completely known. We formulate the problem as a dynamic program and prove structural properties of the optimal policy. Next, we provide a formulation in a contextual episodic reinforcement learning setting, where the parameters governing contextual consumer preferences and return probabilities are unknown and learned over multiple episodes. We develop an algorithm based on the principle of optimism under uncertainty for this problem and provide a regret bound.Managerial implications: Previous approaches that address user disengagement often constrain exploration. However, in our model with non-permanent disengagement with assortments, the optimal solution simply offers larger assortments at the beginning of the horizon and exploration is unconstrained during the learning process. We numerically illustrate model insights and demonstrate regimes where our algorithm outperforms naively myopic learning algorithms.