弹道
计算机科学
强化学习
行人
人工智能
理论(学习稳定性)
机器学习
过程(计算)
障碍物
概化理论
趋同(经济学)
功能(生物学)
局部最优
工程类
数学
地理
运输工程
生物
统计
操作系统
进化生物学
经济增长
物理
经济
考古
天文
作者
Senlin Mu,Xiao Huang,Moyang Wang,Di Zhang,Dong Xu,Xiang Li
标识
DOI:10.1007/s10707-023-00486-5
摘要
Most traditional pedestrian simulation methods suffer from short-sightedness, as they often choose the best action at the moment without considering the potential congesting situations in the future. To address this issue, we propose a hierarchical model that combines Deep Reinforcement Learning (DRL) and Optimal Reciprocal Velocity Obstacle (ORCA) algorithms to optimize the decision process of pedestrian simulation. For certain complex scenarios prone to local optimality, we include expert trajectory imitation degree in the reward function, aiming to improve pedestrian exploration efficiency by designing simple expert trajectory guidance lines without constructing databases of expert examples and collecting priori datasets. The experimental results show that the proposed method presents great stability and generalizability, evidenced by its capability to adjust the behavioral strategy earlier for the upcoming congestion situations. The overall simulation time for each scenario is reduced by approximately 8-44% compared to traditional methods. After including the expert trajectory guidance, the convergence speed of the model is greatly improved, evidenced by the reduced 56-64% simulation time from the first exploration to the global maximum cumulative reward value. The expert trajectory establishes the macro rules while preserving the space for free exploration, avoiding local dilemmas, and achieving optimized training efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI