强化学习
纳什均衡
马尔可夫链
数理经济学
马尔可夫完全平衡
马尔可夫决策过程
计算机科学
马尔可夫过程
经济
人工智能
数学
机器学习
统计
作者
Alireza Ramezani Moghaddam,Hamed Kebriaei
标识
DOI:10.1109/tsmc.2024.3462762
摘要
This article studies the problem of noncooperative multiagent reinforcement learning (MARL), where selfish agents play a general-sum Markov game. We consider the framework where no agent has explicit information on the model of dynamic environment, the model of other agents, and even on its own cost function. We propose an actor–critic MARL to learn the Nash equilibrium (NE) policy of the agents. The main contribution of this article is to extend the NE seeking methods to incomplete information stochastic nonzero sum games. Based on such formulation and under some conventional assumptions, we prove that by applying linear function approximators, the policies of agents converge to an approximation of the first-order NE point of the game. Finally, as a case study, the framework is applied to a Cloud Radio Access Network.
科研通智能强力驱动
Strongly Powered by AbleSci AI