Data-Driven Optimization for Meal Delivery: A Reinforcement Learning Approach for Order-Courier Assignment and Routing at Meituan

强化学习水准点（测量）计算机科学功能（生物学）可扩展性订单（交换）运筹学服务（商务）车辆路径问题布线（电子设计自动化）降低成本线性规划标杆管理还原（数学）一般化经济短缺贝尔曼方程资源配置灵活性（工程）数学优化价值（数学）资源管理（计算）理论（学习稳定性）资源（消歧）服务水平基于仿真的优化战略规划运营效率多武装匪徒最优化问题启发式

作者

Ramón Auad,Felipe Lagos,Tomás Lagos

出处

期刊：Transportation Science [Institute for Operations Research and the Management Sciences]
日期：2026-04-27

标识

DOI：10.1287/trsc.2025.0129

摘要

The rapid growth of online meal delivery has introduced complex logistical challenges, where platforms must dynamically assign orders to couriers while accounting for demand uncertainty, courier autonomy, and service efficiency. Traditional dispatching methods, often focused on short-term cost minimization, fail to capture the long-term implications of assignment decisions on system-wide performance. This paper presents a novel hybrid framework that integrates reinforcement learning with hyper-heuristic optimization to improve sequential order assignment and routing decisions in meal delivery operations. Our approach combines n-step state-action-reward-state-action with value function approximation and a multiarmed bandit-based hyper-heuristic incorporating seven specialized low-level heuristics. Our approach explicitly models the evolving system state, enabling dispatching policies that balance immediate efficiency with future operational performance. By employing scalable linear value function approximation, we enhance policy learning in high-dimensional environments while maintaining generalization across states and actions. Using real operational data from the food delivery platform Meituan, we develop a comprehensive simulation environment that captures order dynamics, courier behavior, and service times. Through extensive computational experiments, we demonstrate that our framework significantly outperforms traditional benchmark policies, achieving 12% cost reduction through strategic order postponement. Our results reveal that the largest improvements occur during high-demand periods with courier shortages and that a 10% increase in courier availability yields greater benefits than algorithmic improvements alone. The proposed methodology effectively balances immediate operational efficiency with long-term performance while providing valuable insights for meal delivery platforms regarding courier fleet management and order assignment strategies. History: This paper has been accepted for the Transportation Science Special Issue The First INFORMS TSL Data-Driven Research Challenge.

求助该文献

最长约 10秒，即可获得该文献文件

Data-Driven Optimization for Meal Delivery: A Reinforcement Learning Approach for Order-Courier Assignment and Routing at Meituan

今日热心研友