强化学习
马尔可夫决策过程
计算机科学
集合(抽象数据类型)
马尔可夫过程
过程(计算)
自主代理人
状态空间
人工智能
增强学习
机器学习
数学
统计
操作系统
程序设计语言
作者
Xinfeng Zhang,Lin Wu,Huan Liu,Yajun Wang,Hao Li,Bin Xu
标识
DOI:10.1109/jiot.2023.3304890
摘要
To improve the decision success rate of a multiagent reinforcement learning algorithm in merging high-speed ramps of autonomous vehicles, the independent proximal policy optimization (IPPO) method is presented. The Markov decision process (MDP) model for autonomous vehicle behavioral decision making is developed. Moreover, the state space, reward function, and action space are all designed. An IPPO method is proposed using independent learning and parameter-sharing strategies based on the proximal policy optimization algorithm. And further, a decision-making model for autonomous driving behavior is built. For simulation experiments, a highway ramp scenario is set. The experiment findings indicate that the IPPO algorithm can significantly increase the decision success rate of autonomous vehicles in the ramp merging assignment. Also, as compared to the MAACKTR and GPPO algorithms, the IPPO algorithm can achieve a better average reward and finish the ramp merging more rapidly.
科研通智能强力驱动
Strongly Powered by AbleSci AI