An online reinforcement learning approach to charging and order-dispatching optimization for an e-hailing electric vehicle fleet

强化学习马尔可夫决策过程计算机科学动态规划启发式收入运筹学数学优化弹道工作量车队管理电动汽车随机规划马尔可夫过程总收入订单（交换）匹配（统计）人工智能工程类功率（物理）经济数学物理财务量子力学电信统计会计算法天文操作系统

作者

Pengyu Yan,Kaize Yu,Xiuli Chao,Zhibin Chen

出处

期刊：European Journal of Operational Research [Elsevier]
日期：2023-11-01 卷期号：310 (3): 1218-1233 被引量：4

标识

DOI：10.1016/j.ejor.2023.03.039

摘要

Given the uncertainty of orders and the dynamically changing workload of charging stations, how to dispatch and charge electric vehicle (EV) fleets becomes a significant challenge facing e-hailing platforms. The common practice is to dispatch EVs to serve orders by heuristic matching methods but enable EV drivers to independently make charging decisions based on their experiences, which may compromise the platform's performance. This study proposes a Markov decision process to jointly optimize the charging and order-dispatching schemes for an e-hailing EV fleet, which provides pick-up services for passengers only from a designated transportation hub (i.e., no pick-up from different locations). The objective is to maximize the total revenue of the fleet throughout a finite horizon. The complete state transition equations of the EV fleet are formulated to track the state-of-charge of their batteries. To learn the charging and order-dispatching policy in a dynamic stochastic environment, an online approximation algorithm is developed, which integrates the model-based reinforcement learning (RL) framework with a novel SARSA(Δ)-sample average approximation (SAA) architecture. Compared with the model-free RL algorithm and approximation dynamic programming (ADP), our algorithm explores high-quality decisions by an SAA model with empirical state transitions and exploits the best decisions so far by an SARSA(Δ) sample-trajectory updating. Computational results based on a real case show that, compared with the existing heuristic method and the ADP in the literature, the proposed approach increases the daily revenue by an average of 31.76% and 14.22%, respectively.

求助该文献

最长约 10秒，即可获得该文献文件

An online reinforcement learning approach to charging and order-dispatching optimization for an e-hailing electric vehicle fleet

今日热心研友