[Lecture] Last-Iterate Convergence for Learning in Markov Games with Bandit Feedback

Jan. 13, 2024

Speaker: Weiqiang Zheng, Yale University

Time: 16:00 p.m., January 13, 2024, GMT+8

Venue: Room 204, Courtyard No.5, Jingyuan

Abstract:

Online learning in multi-player games captures many modern machine learning applications, ranging from generative adversarial networks and adversarial training to robust optimization and multi-agent reinforcement learning. Understanding last-iterate convergence in games is crucial since the last iterate characterizes the stability of the learning process and is widely used in practice. In this talk, we study the problem of learning in two-player zero-sum Markov games, focusing on developing decentralized learning algorithms with non-asymptotic last-iterate convergence rates to Nash equilibrium. We first present a simple algorithm with $O(t^{-1/8})$ last-iterate convergence rate in two-player zero-sum matrix games with bandit feedback. To the best of our knowledge, this is the first result that obtains finite last-iterate convergence rate given access to only bandit feedback. We then extend our result to the setting of two-player zero-sum Markov games, providing the first set of decentralized algorithms with non-asymptotic last-iterate/path convergence rates. This talk is based on joint work with Yang Cai, Haipeng Luo, and Chen-Yu Wei.

Source: Center on Frontiers of Computing Studies, PKU