Title:
Recent Advances in Training Acceleration by Decentralized Communication

9:00 - 10:00, Dec. 17, 2022, UTC+8.

印卧涛

印卧涛博士,阿里巴巴达摩院科学家,并任决策智能实验室主任。加州大学洛杉矶分校数学系终身教授( 2013 -2021 ),莱斯大学计算与应用数学系助理教授( 2006 -2012 )终身副教授( 2012 -2013 )。获得过 NSF CAREER 奖、斯隆研究奖、晨兴应用数学金奖、达摩奖和 Egon Balas 奖。自2018 年起,为全球 top1%高被引数学家。

Abstract: 
As deep neural network models become increasingly massive, classic communication approaches have become bottlenecks. With n agents, the typical server-worker topology takes O(n) time to complete each Reduce communication. RingAllreduce, while taking advantage of available bandwidths, has O(n) latency to complete each AllReduce communication. Initially introduced for wireless sensor networks, decentralized communication has become a viable approach for distributed training. This talk presents recent advances in designing the communication topologies for faster decentralized training of large neural networks and its open-source software framework BlueFog. Each round will take O(1) time, and the total training time will be significantly reduced. This line of research complements communication compression techniques such as gradient sparsification, quantification, and communication skipping.

Primary collaborators: Kun Yuan (Peking U), Bicheng Ying (Google), Xinmeng Huang (U Penn)