A reinforcement learning-based hyper-heuristic for AGV task assignment and route planning in parts-to-picker warehouses

强化学习任务（项目管理）启发式计算机科学钢筋线路规划人工智能运输工程工程类系统工程结构工程

作者

Kunpeng Li,Tengbo Liu,P.N. Ram Kumar,Xuefang Han

出处

期刊：Transportation Research Part E-logistics and Transportation Review [Elsevier BV]
日期：2024-04-04 卷期号：185: 103518-103518 被引量：45

标识

DOI：10.1016/j.tre.2024.103518

摘要

Globally, e-commerce warehouses have begun implementing robotic mobile fulfillment systems (RMFS), which can improve order-picking efficiency by using automated guided vehicles (AGVs) to realize operations from parts to pickers. AGVs depart from their initial points, move to a target rack position, and subsequently transport racks to picking stations. The AGVs return the racks to their original positions after the workers pick them up. When all tasks are completed, the AGVs return to their starting point. In this context, the main challenge is the task assignment and route planning of multiple AGVs to minimize travel times. We formulate a mixed-integer linear programming (MILP) model with valid inequalities to solve small problem instances optimally. We introduce a reinforcement learning (RL)-based hyper-heuristic (HH) framework to solve large instances to near-optimality. A typical HH framework comprises two levels: high-level heuristics (HLH) and low-level heuristics (LLH). The framework starts from an initial solution and improves iteratively through LLHs, while the HLH invokes a selection strategy and an acceptance criterion to generate a new solution. We propose a novel selection strategy based on the improved Multi-Armed Bandits algorithm called Co-SLMAB and Exponential Monte Carlo with counters (EMCQ) as the acceptance criterion. The corresponding collision avoidance rules are then formulated for different conflicts to construct a conflict-free traveling route for AGVs. Besides testing the proposed framework's effectiveness in real-life warehouse layouts, we perform extensive computational experiments and a thorough sensitivity analysis. The results show that (i) the proposed valid inequalities aid in obtaining better lower bounds and significantly speed up the solution process; (ii) the Co-SLMAB-HH framework is quite competitive compared to CPLEX, outperforming the other tested hyper-heuristics and the problem-specific heuristic regarding convergence and computation time; and (iii) a pool of LLHs consisting of a wide range of different operators is advantageous over a limited set of simple operators while solving problems using hyper-heuristics.

求助该文献

最长约 10秒，即可获得该文献文件

A reinforcement learning-based hyper-heuristic for AGV task assignment and route planning in parts-to-picker warehouses

今日热心研友