Title:
Title:On the Value Function in Reinforcement Learning for Ridesharing

15:15 - 15:55, Dec. 16, 2022, UTC+8.

Zhiwei (Tony) Qin

Tony Qin is Principal Scientist at Lyft Rideshare Labs, working on core problems in ridesharing marketplace optimization. Previously, he was Principal Research Scientist and Director of the Decision Intelligence group at DiDi AI Labs and Staff Scientist in supply chain and inventory optimization at Walmart Global E-commerce. Tony received his Ph.D. in Operations Research from Columbia University. His research interests span optimization and machine learning, with a particular focus in reinforcement learning and its applications in operational optimization, digital marketing, and smart transportation. He is Associate Editor of the ACM Journal on Autonomous Transportation Systems. He has published more than 40 papers in top-tier conferences and journals in machine learning and optimization and served as Program Committee of NeurIPS, ICML, AAAI, IJCAI, KDD, and a referee of top journals. He and his team received the INFORMS Daniel H. Wagner Prize for Excellence in Operations Research Practice in 2019 and were selected for the NeurIPS 2018 Best Demo Awards. Tony holds more than 10 US patents in intelligent transportation, supply chain, and recommendation systems.

Abstract:
A focal point of reinforcement learning for ridesharing is the value function learning. In this talk, we will recapitulate the productionized RL methods based on offline learning, and then talk about recently deployed works that shift toward online on-policy updates. We will demonstrate how shared value functions can be adopted to coordinate multiple rideshare levers. Finally, we will discuss learning the value functions for individual market participating units (both supply and demand) while making sure that they collectively approximate the system values well.