收益管理
计算机科学
后悔
非参数统计
收入
数学优化
稳健性(进化)
上下界
运筹学
产品(数学)
动态定价
经济
微观经济学
计量经济学
预算约束
随机贴现因子
预期收益
供应链
特征(语言学)
供求关系
销售损失
需求预测
完整信息
标识
DOI:10.1177/10591478261424032
摘要
Product returns are prevalent in practice. Many retailers provide lenient free return policies but with specific return window within which customers are allowed to return products. Motivated by this phenomenon, we consider a single-product online learning and pricing problem with stochastic product returns. A salient feature is that the demand function, depending on price and return window decisions, is initially unknown and must be learned on the fly. The retailer thus faces the classic exploration–exploitation trade-off. Moreover, we consider an inventory constraint, introducing an additional trade-off between earning revenue and managing inventory. We propose a modeling framework to integrate pricing and return window decisions, and develop a deterministic fluid model that serves as the full-information benchmark. To tackle the learning problem, we design a novel nonparametric learning algorithm that seamlessly integrates inverse stochastic gradient descent (SGD) and Upper Confidence Bound (UCB) methods. Under mild assumptions on demand and revenue functions, we establish a regret upper bound for our learning algorithm as O ( W T log T ) , where W denotes the number of return window candidates and T denotes the time horizon. This result aligns with lower bounds established in both online pricing and multi-armed bandit (MAB) literature. Numerical experiments are conducted to verify the effectiveness and robustness of our algorithm across various environments. From an operational standpoint, retailers can use our learning framework as a decision-support tool to identify the optimal price and return window.
科研通智能强力驱动
Strongly Powered by AbleSci AI