强化学习
水准点(测量)
计算机科学
功能(生物学)
可扩展性
订单(交换)
运筹学
服务(商务)
车辆路径问题
布线(电子设计自动化)
降低成本
线性规划
标杆管理
还原(数学)
一般化
经济短缺
贝尔曼方程
资源配置
灵活性(工程)
数学优化
价值(数学)
资源管理(计算)
理论(学习稳定性)
资源(消歧)
服务水平
基于仿真的优化
战略规划
运营效率
多武装匪徒
最优化问题
启发式
作者
Ramón Auad,Felipe Lagos,Tomás Lagos
标识
DOI:10.1287/trsc.2025.0129
摘要
The rapid growth of online meal delivery has introduced complex logistical challenges, where platforms must dynamically assign orders to couriers while accounting for demand uncertainty, courier autonomy, and service efficiency. Traditional dispatching methods, often focused on short-term cost minimization, fail to capture the long-term implications of assignment decisions on system-wide performance. This paper presents a novel hybrid framework that integrates reinforcement learning with hyper-heuristic optimization to improve sequential order assignment and routing decisions in meal delivery operations. Our approach combines n-step state-action-reward-state-action with value function approximation and a multiarmed bandit-based hyper-heuristic incorporating seven specialized low-level heuristics. Our approach explicitly models the evolving system state, enabling dispatching policies that balance immediate efficiency with future operational performance. By employing scalable linear value function approximation, we enhance policy learning in high-dimensional environments while maintaining generalization across states and actions. Using real operational data from the food delivery platform Meituan, we develop a comprehensive simulation environment that captures order dynamics, courier behavior, and service times. Through extensive computational experiments, we demonstrate that our framework significantly outperforms traditional benchmark policies, achieving 12% cost reduction through strategic order postponement. Our results reveal that the largest improvements occur during high-demand periods with courier shortages and that a 10% increase in courier availability yields greater benefits than algorithmic improvements alone. The proposed methodology effectively balances immediate operational efficiency with long-term performance while providing valuable insights for meal delivery platforms regarding courier fleet management and order assignment strategies. History: This paper has been accepted for the Transportation Science Special Issue The First INFORMS TSL Data-Driven Research Challenge.
科研通智能强力驱动
Strongly Powered by AbleSci AI