随机性
功能(生物学)
范畴变量
强化学习
价值(数学)
计算机科学
一般化
期望值
分布(数学)
人工智能
数学
机器学习
统计
进化生物学
生物
数学分析
作者
Jian Zhao,Mingyu Yang,Youpeng Zhao,Xunhan Hu,Wengang Zhou,Houqiang Li
标识
DOI:10.1109/tg.2023.3310150
摘要
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a team reward and observing the next state. During the interactions, the uncertainty of environment and reward will inevitably induce stochasticity in the long-term returns and the randomness can be exacerbated with the increasing number of agents. However, such randomness is ignored by most of the existing value-based multi-agent reinforcement learning (MARL) methods, which only model the expectation of Q-value for both individual agents and the team. Compared to using the expectations of the long-term returns, it is preferable to directly model the stochasticity by estimating the returns through distributions. With this motivation, this work proposes a novel value-based MARL framework from a distributional perspective, i.e. , parameterizing value function via M ixture of C ategorical distributions for MARL. Specifically, we model both individual Q-values and global Q-value with categorical distribution. To integrate categorical distributions, we define five basic operations on the distribution, which allow the generalization of expected value function factorization methods ( e.g. , VDN and QMIX) to their MCMARL variants. We further prove that our MCMARL framework satisfies Distributional-Individual-Global-Max (DIGM) principle with respect to the expectation of distribution, which guarantees the consistency between joint and individual greedy action selections in the global Q-value and individual Q-values. Empirically, we evaluate MCMARL on both a stochastic matrix game and a challenging set of StarCraft II micromanagement tasks, showing the efficacy of our framework.
科研通智能强力驱动
Strongly Powered by AbleSci AI