数理经济学
计算机科学
单调多边形
纳什均衡
随机博弈
博弈论
序列(生物学)
极限(数学)
数学
几何学
遗传学
生物
数学分析
作者
Benoît Duvocelle,Panayotis Mertikopoulos,Mathias Staudigl,Dries Vermeulen
标识
DOI:10.1287/moor.2022.1283
摘要
We examine the long-run behavior of multi-agent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit; and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient-based and payoff-based feedback - i.e., the "bandit feedback" case where players only get to observe the payoffs of their chosen actions.
科研通智能强力驱动
Strongly Powered by AbleSci AI