纳什均衡
强化学习
ε平衡
最佳反应
计算机科学
极小极大
数学优化
相关平衡
梯度下降
均衡选择
博弈论
数理经济学
人工智能
数学
重复博弈
人工神经网络
作者
Zhen-yu Li,Ya-Zhong Luo
标识
DOI:10.1109/tnnls.2024.3351631
摘要
Nash equilibrium is a significant solution concept representing the optimal strategy in an uncooperative multiagent system. This study presents two deep reinforcement learning (DRL) algorithms for solving the Nash equilibrium of differential games. Both algorithms are built upon the distributed distributional deep deterministic policy gradient (D4PG) algorithm, which is a one-sided learning method. We modified it to a two-sided adversarial learning method. The first is D4PG for games (D4P2G), which directly applies an adversarial play framework based on the D4PG. A simultaneous policy gradient descent (SPGD) method is employed to optimize the policies of the players with conflicting objectives. The second is the distributional deep deterministic symplectic policy gradient (D4SPG) algorithm, which is our main contribution. More specifically, it newly designs a minimax learning framework that combines the critics of the two players and proposes a symplectic policy gradient adjustment method to find a better policy gradient. Simulations show that both algorithms converge to the Nash equilibrium in most cases, but D4SPG can learn the Nash equilibrium more accurately and efficiently, especially in Hamiltonian games. Moreover, it can handle games with complex dynamics, which is challenging for traditional methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI