Manzil Zaheer: "Federated Optimization in NLP”

Title: Federated Optimization in NLP

Abstract: Modern NLP techniques, starting from the most basic bag of word classifiers to complex deep learning models, rely on efficient optimization techniques like sparsity constraints or adaptivity. In this talk, we look at how to translate such techniques to federated learning and private settings. We begin by showing that enforcing constraints such as sparsity (such as in text classifier) is not straightforward as direct extensions of algorithms such as FedAvg suffer from the “curse of primal averaging,” resulting in poor convergence. As a solution, we discuss a new primal-dual algorithm, Federated Dual Averaging (FedDualAvg), which by employing a novel server dual averaging procedure circumvents the curse of primal averaging. Next we look at introducing federated versions of adaptive optimizers, including Adagrad, Adam, and Yogi, which have notable success in NLP. Unfortunately, the benefits of adaptivity may degrade when training with differential privacy, as the noise added to ensure privacy reduces the effectiveness of the adaptive preconditioner. We discuss a framework using simple public statistics as word frequency frequencies which leads to regaining the lost benefits when applying state-of-the-art optimizers in private settings. Finally, we end by discussing open questions for federated optimizations in NLP.

Bio: Manzil Zaheer is currently a research scientist at Google DeepMind. He received his PhD in Machine Learning from the School of Computer Science at Carnegie Mellon University. His research interest is in developing intelligent systems that can utilize the vast amounts of information efficiently and faithfully. His work has been at the interplay of statistical models, data structures, and optimization.