*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Frank-Wolfe Methods for Optimization and Machine Learning
Date: Thursday, December 10, 2020
Time: 2pm EST
Location: https://bluejeans.com/982850197
Cyrille Combettes
Ph.D. student in Machine Learning
School of Industrial and Systems Engineering
Georgia Institute of Technology
Committee:
Prof. Sebastian Pokutta (advisor), Institute of Mathematics, Technische Universität Berlin and Department for AI in Society, Science, and Technology, Zuse Institute Berlin
Prof. Swati Gupta, School of Industrial and Systems Engineering, Georgia Institute of Technology
Prof. Guanghui (George) Lan, School of Industrial and Systems Engineering, Georgia Institute of Technology
Abstract:
The Frank-Wolfe algorithm (FW) has become a popular constrained optimization algorithm for it is simple and projection-free, and it has been successfully applied to problems in traffic assignment, matrix completion, computer vision, optimal transport, and many others. Its main drawback however lies in its convergence rate, which can be excessively slow due to naive descent directions. In the first part of the proposal, we address this issue by designing a boosting procedure generating descent directions better aligned with the negative gradients while still preserving the projection-free property. Although our method is reasonably simple and intuitive, it produces very significant results. We derive sublinear to linear convergence rates and demonstrate computational speedups over the state of the art on a wide variety of experiments.
In the second part of the proposal, we consider the large-scale finite-sum setting arising in many tasks of machine learning. We improve the quality of the first-order information accessed by stochastic FW algorithms by blending in adaptive gradients using a sliding technique. We derive sublinear convergence rates and demonstrate that the blend of the two methods is successful via computational experiments on convex and nonconvex objectives. In particular, our method is the only one to improve the performance of the vanilla stochastic FW on the training of neural networks.
Both previous contributions are motivated by the projection-free property of FW. In the last part of the proposal, we leverage the natural sparsity of the iterates of FW to study an application to the approximate Carathéodory problem. We show that FW generates a simple solution to the problem and that with no modification of the algorithm, better cardinality bounds can be established using existing convergence analyses of FW in different scenarios.