PhD Proposal by Jun Kun Wang

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details
  • Date/Time:
    • Monday March 23, 2020 - Tuesday March 24, 2020
      1:00 pm - 2:59 pm
  • Location: Klaus 2100
  • Phone:
  • URL:
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: Understanding Modern Techniques in Optimization: Momentum, Over-parametrization, and Accelerated Gradient Methods.

Full Summary: No summary paragraph submitted.

Title: Understanding Modern Techniques in Optimization: Momentum, Over-parametrization, and Accelerated Gradient Methods.

 

Jun-Kun Wang

PhD Student

School of Computer Science

College of Computing

Georgia Institute of Technology

 

Date: Monday, March 23th, 2019

Time: 13:00 pm

Location: Klaus 2100

Location: Cisco Webex

Meeting number (access code): 792 819 876

Meeting password: proposal

Meeting link: 

https://jimwang-51.my.webex.com/jimwang-51.my/j.php?MTID=mebff15d08dc9cbe20f283f8247761399

 

Committee:

Dr. Jacob Abernethy (Advisor, School of Computer Science, Georgia Institute of Technology)

Dr. Le Song (School of Computational Science and Engineering, Georgia Institute of Technology)

Dr. Guanghui (George) Lan (School of Industrial and Systems Engineering, Georgia Institute of Technology)

 

Abstract:

Over the past decade, we have seen many success stories emerge from progress in deep learning. On the the theory side, however, we have remained far behind in understanding how and why deep learning works. One pertinent example is the performance of training methods: currently stochastic gradient descent (SGD) with momentum is the de facto champion of optimization methods, as many have observed that it is significantly faster than vanilla SGD for training neural nets. At the same time, a solid understanding of this phenomenon has remained elusive in non-convex optimization. A common heuristic is to choose a large value for the momentum parameter (e.g. \beta=0.9), for example, yet the literature offers little in the way of theoretical justification.

 

My thesis proposal will consist of three parts. The first part is about momentum. I will show that Polyak's momentum, the most popular one and the default momentum method in PyTorch, can be provably shown to lead to an acceleration compared to vanilla gradient descent for a broad class of non-convex optimization problems. Moreover, the larger the momentum parameter, the faster the global convergence, up to a certain threshold. I will talk about the implications of these results and also show that momentum can help escaping saddle points faster or finding a second order stationary point faster. The second part of my thesis proposal will be over-parametrization. It is widely observed that over-parametrization leads to an acceleration for training neural nets in practice. However, the underlying reason for the acceleration remains mysterious. I will propose understanding over-parametrization through a toy example. By over-parametrizing an objective function in certain ways, gradient descent can be shown to reaching at a desired solution faster.

 

The final part of the proposal involves understanding accelerated gradient methods through the lens of a two-player zero-sum game. While a game-theoretic interpretation of accelerated gradient methods has been previously observed (e.g. Lan and Zhou 2018), we show that this perspective provides a stronger connection between optimization and online learning/no-regret learning. I will show that if one pits two no-regret algorithms against each other in a particular zero-sum game, combined with the appropriate reweighting scheme, one can recover many existing accelerated gradient methods from the resulting game dynamics. These include Nesterov's method (Nesterov 1983;1988;2005) and accelerated proximal method (Bech and Teboulle 2009). Furthermore, the sum of the weighted average regret of both players implies the convergence rate, which leads to a simple alternative acceleration convergence proof. Our framework is very modular, and one can swap in/out different no-regret learning strategies to obtain various gradient methods with appropriate theoretical guarantees. In addition, I will show that the game approach also recovers Frank-Wolfe method and leads to some new accelerated Frank-Wolfe-like methods on certain constraint sets, answering an open question of Garber and Hazan, 2015.

Additional Information

In Campus Calendar
No
Groups

Graduate Studies

Invited Audience
Faculty/Staff, Public, Graduate students, Undergraduate students
Categories
Other/Miscellaneous
Keywords
Phd proposal
Status
  • Created By: Tatianna Richardson
  • Workflow Status: Published
  • Created On: Mar 4, 2020 - 1:41pm
  • Last Updated: Mar 19, 2020 - 4:20pm