强化学习
马尔可夫决策过程
计算机科学
订单(交换)
搭便车问题
过程(计算)
推荐系统
排名(信息检索)
马尔可夫过程
人工智能
运筹学
机器学习
工程类
业务
经济
财务
公共物品
微观经济学
统计
数学
操作系统
作者
Xing Wang,Ling Wang,Chenxin Dong,Hao Ren,Ke Xing
标识
DOI:10.1109/tits.2023.3237580
摘要
As an important part of intelligent transportation systems, On-demand Food Delivery (OFD) becomes a prevalent logistics service in modern society. With the continuously increasing scale of transactions, rider-centered assignment manner is gaining more attraction than traditional platform-centered assignment among food delivery companies. However, problems such as dynamic arrivals of orders, uncertain rider behaviors and various false-negative feedbacks inhibit the platform to make a proper decision in the interaction process with riders. To address such issues, we propose an online Deep Reinforcement Learning-based Order Recommendation (DRLOR) framework to solve the decision-making problem in the scenario of OFD. The problem is modeled as a Markov Decision Process (MDP). The DRLOR framework mainly consists of three networks, i.e., the actor-critic network that learns an optimal order ranking policy at each interaction step, the rider behavior prediction network that predicts the grabbing behavior of riders and the feedback correlation network based on attention mechanism that identifies valid feedback information from false feedbacks and learns a high-dimensional state embedding to represent the states of riders. Extensive offline and online experiments are conducted on Meituan delivery platform and the results demonstrate that the proposed DRLOR framework can significantly shorten the length of interactions between riders and the platform, leading to a better experience of both riders and customers.
科研通智能强力驱动
Strongly Powered by AbleSci AI