*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Thesis Title: Stochastic Algorithms for Distributed Optimization and Machine Learning
Advisors: Dr. Guanghui (George) Lan
Committee members:
Dr. Arkadi Nemirovski
Dr. Sebastian Pokutta
Dr. Huan Xu
Dr. Le Song (School of Computational Science and Engineering, CoC, GT)
Date and Time: Monday, June 11, 2018, 10:30 AM
Location: ISyE Groseclose 402
Abstract:
In the big data era, machine learning acts as a powerful tool to help us make predictions and decisions. It has strong ties to the field of optimization, in the way the latter provides methods and theory. While data tends to be collected in a distributed fashion, the standard machine learning models require centralizing the training data on one machine or in a data center, which incurs significant communication cost and puts data privacy at risk. To circumvent such an issue, a variety of distributed machine learning models, i.e., optimization problems defined over different network systems, have been proposed and studied in the literature. In this thesis, we focus on designing and analyzing efficient stochastic algorithms for distributed machine learning problems that can achieve best-known communication complexities and optimal sampling complexities.
The first part of this thesis is devoted to investigate randomized incremental gradient (RIG) methods for solving the finite-sum optimization problems, which is the core problem in distributed optimization. By developing and proving the optimality of the primal-dual gradient method, which covers a variant of the well-known Nesterov's accelerated gradient method as a special case, we propose its randomized version, namely the randomized primal-dual gradient (RPDG) method. Similar to other RIG methods (e.g., SAG and SAGA), RPDG only involves the computation of one randomly selected component function per iteration. Moreover, by providing a lower complexity bound for the class of RIG methods for finite-sum optimization, we demonstrate the optimality of RPDG, and this is the first time such an optimal RIG method has been developed for solving finite-sum optimization problems.
In the second part of this thesis, we consider a distributed topology with a central authority, and study the distributed finite-sum optimization problems defined over such a star network topology. We propose an optimal RIG method, namely the random gradient extrapolation method (RGEM) and show that it does not require any exact gradient evaluations even at the initial point, but can still achieve the optimal communication and sampling complexities for solving distributed finite-sum optimization problems. To the best of our knowledge, this is the first time that such an optimal RIG method without any exact gradient evaluations has been presented for solving finite-sum optimization in the literature. In fact, without any full gradient computation, RGEM possesses iteration costs as low as pure stochastic gradient descent (SGD) methods, but achieves a much faster and optimal linear rate of convergence for solving deterministic finite-sum problems. Moreover, we extend RGEM for stochastic finite-sum optimization, i.e., at each iteration only one randomly selected network agent needs to compute an estimator of its gradient by sampling from its local data. Note that for these problems, it is difficult to compute the exact gradients even at the initial point.
In the third part of this thesis, we study another distributed topology, called the decentralized network topology, where there is no central authority in the distributed networks. Consider communication is the major bottleneck, we present a class of communication-efficient algorithms for solving stochastic decentralized nonsmooth optimization problems. We introduce a new stochastic decentralized primal-dual type method, called stochastic decentralized communication sliding (SDCS), where the agents can skip communications while solving their local subproblems iteratively. And we show that SDCS achieves the best-known communication complexities and not improvable sampling complexities. Preliminary numerical experiments are performed to show that SDCS can significantly save communication costs over some existing state-of-the-art decentralized methods in all our tested instances. Consider synchronization is another critical issue in decentralized optimization other than communication, we further extend the communication-sliding idea to the asynchronous setting. By randomly activating two network agents per iteration, the proposed asynchronous stochastic decentralized primal-dual type methods can maintain the communication and sampling complexities obtained by SDCS, but these methods can be applied to solve a broader class of decentralized stochastic problems, for example, composite objective functions with smooth structures.