数理经济学
纳什均衡
完美信息
趋同(经济学)
不完美的
完整信息
最佳反应
计算机科学
数学
数学优化
经济
语言学
经济增长
哲学
作者
Runyu Lu,Yuanheng Zhu,Dongbin Zhao,Yu Liu,You He
标识
DOI:10.1109/tnnls.2024.3516693
摘要
Imperfect information and multiple players are the two common features of real-world games. However, few of the existing game-theoretic methods are applicable to multiplayer imperfect information games (IIGs) when it comes to finding Nash equilibria. Moreover, the commonly used methods that rely on average-iterate convergence are not conducive to deep reinforcement learning (DRL), which is widely applied to large-scale problems, as it is costly to preserve average policies under function approximation. To deal with these problems, we construct a continuous-time dynamic named imperfect-information exponential-decay score-based learning (IESL) by considering the concept of Nash distribution a type of quantal response equilibrium (QRE) in IIGs. Theoretically, we prove the last-iterate convergence of IESL to approximate Nash equilibria in multiplayer IIGs under the assumption of individual concavity. Empirically, we verify that IESL converges in six poker scenarios, with the ultimate NashConv lower than that of the comparative methods (including counterfactual regret minimization (CFR), replicator dynamics (RDs), and their variants) in multiplayer Leduc hold'em. When compared with the existing equilibrium-finding algorithms in multiplayer normal-form games (NFGs), IESL also demonstrates a more stable performance. In addition, we observe a trade-off between the difficulty of IESL's last-iterate convergence and the NashConv of the convergent policies, which aligns with our convergence analysis based on the hypomonotonicity of the game.
科研通智能强力驱动
Strongly Powered by AbleSci AI