*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Distributed Learning and Inference in Deep Models
Committee:
Dr. Faramarz Fekri, ECE, Chair , Advisor
Dr. Ghassan AlRegib, ECE
Dr. Justin Romberg, ECE
Dr. Matthieu Bloch, ECE
Dr. Siva Theja Maguluri, ISyE
Abstract: In this thesis, we consider the challenges encountered in training and inference of large deep models, especially on nodes with limited computational power and capacity. We study two classes of related problems; 1) distributed training of deep models, and 2) compression and restructuring of deep models for efficient distributed and parallel execution to reduce inference times. Especially, we consider the communication bottleneck in distributed training and inference of deep models. In the first part of the thesis, we consider distributed deep learning. Data compression is a viable tool to mitigate the communication bottleneck. However, the existing methods suffer from a few drawbacks, such as the increased variance of stochastic gradients (SG), slower convergence rate, or added bias to SG. We address these challenges from three different perspectives: 1) Information Theory and the CEO Problem, 2) Indirect SG compression via Matrix Factorization, and 3) Compressive Sampling.
Next, we consider federated learning over wireless multiple access channels (MAC). To satisfy the communication and power constraints of the network, and take advantage of the over-the-air computation inherent in MAC, we propose a framework based on random linear coding and develop efficient power management and channel usage techniques to manage the trade-offs between power consumption and communication bit-rate. In the second part of this thesis, we consider the distributed parallel implementation of an already-trained deep model on multiple workers. Since latency due to the synchronization and data transfer among workers adversely affects the performance of the parallel implementation, it is desirable to have minimum interdependency among parallel sub-models. To achieve this goal, we introduce RePurpose, an efficient algorithm to rearrange the neurons in the neural network and partition them such that the interdependency among sub-models is minimized under the computations and communications constraints of the workers.