Title:
Training a Population of Reinforcement Learning Agents

Yaodong Yang

Assistant Professor,
King's College London

Oliver Slumbers

PhD student,
University College London

Abstract: 
Recent advances in multiagent reinforcement learning have seen the introduction of a new learning paradigm that revolves around population-based training. The idea is to consider the structure of games not at the micro-level of individual actions, but at the meta-level of the which agent to train against for any given game or situation. A typical framework of population-based training is Policy Space Response Oracle (PSRO) method where, at each iteration, a new RL agent is discovered as the best response to a Nash mixture of agents from the opponent populations. PSRO methods can provably converge to Nash, correlated and coarse correlated equilibria in N-player games; particularly, they have showed remarkable performance on solving large-scale zero-sum games. In this tutorial, I will introduce the basic idea of PSRO methods, the necessity of using PSRO methods in solving real-world games such as Chess, the recent results on solving N-player games and mean-field games, and how to promote behavioral diversity during PSRO training. At last, I will introduce a meta-PSRO framework named Neural Auto-Curricula where we make AI learning to learn a PSRO-like solution algorithm purely from data.

  Yaodong is a machine learning researcher with ten-year working experience in both academia and industry of finance/high-tech companies. Currently, he is an assistant professor at King's College London. His research is about reinforcement learning and multi-agent systems. He has maintained a track record of more than forty publications at top conferences/journals, along with the best system paper award at CoRL 2020 (first author) and the best blue-sky paper award at AAMAS 2021 (first author). Before KCL, he was a principal research scientist at Huawei UK where he headed the multi-agent system team in London, working on autonomous driving applications. Before Huawei, he was a senior research manager at AIG, working on AI applications in finance. He holds a Ph.D. degree from University College London, an M.Sc. degree from Imperial College London and a Bachelor degree from University of Science and Technology of China.

  Oliver Slumbers is a second-year PhD student at University College London (UCL) under the supervision of Dr. Jun Wang. His research is on multi-agent systems and reinforcement learning, specifically with applications of Game Theory to population-based methods. Before beginning his PhD, Oliver received a BSc in Economics from the University of Warwick, and an MSc in Computational Statistics and Machine Learning from UCL. He has first-author publications in top-tier conferences NeurIPS and ICML, and helped co-author the best blue-sky paper at AAMAS 2021.