*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Thesis Title: Distributionally Robust Stochastic Optimization with Applications in Statistical Learning
Advisor: Dr. Anton Kleywegt
Committee members:
Dr. Shabbir Ahmed
Dr. Jim Dai (Cornell ORIE)
Dr. Alexander Shapiro
Dr. Melvyn Sim (NUS Business School)
Date and Time: Monday, April 2, 2018, 9:30 AM
Location: ISyE Groseclose 402
Abstract:
In this thesis, we study distributionally robust stochastic optimization (DRSO), a recent emerging framework for solving decision-making under uncertainty. In this framework, instead of assuming that there is a known underlying probability distribution that drives the uncertain behavior of stochastic systems, one seeks solutions that perform well for a family of distributions, so as to hedge against the distributional uncertainty in the future. This thesis focuses on the design of tractable models for DRSO. We develop novel formulations and insights for fundamental problems, and discover connections between different areas of optimization, statistics, and learning.
We first consider the key question on how to construct a good family of distributions to hedge against. We point out that such family should be chosen to be appropriate for the application at hand, and that some of the choices that have been popular until recently are, for many applications, not good choices. We consider distributions that are within a chosen Wasserstein distance from a nominal distribution, for example, an empirical distribution resulting from available data. We demonstrate that the resulting distributions hedged against are more reasonable than those resulting from other popular choices of sets. Moreover, the problem of determining the worst-case expectation over the resulting family of distributions has desirable tractability properties. We derive a dual reformulation of the Wasserstein DRSO problem in a very general setting, by constructing (approximate) worst-case distributions explicitly via the first-order optimality conditions of the dual problem. The worst-case distributions have a concise structure and a clear interpretation.
Next, we establish a connection between Wasserstein DRSO and regularization in statistical learning. More precisely, we identify a broad class of loss functions, for which the Wasserstein DRSO is asymptotically equivalent to a regularization problem with a gradient-norm penalty. Such relation provides new interpretations for problems involving regularization, including a great number of statistical learning problems and discrete choice models (e.g. multinomial logit). The connection also suggests a principled way to regularize high-dimensional non-convex learning problems, which is demonstrated through the training of Wasserstein generative adversarial networks in deep learning.
In the final part of the thesis, we consider robust decision-making when the data availability from marginal distributions is different than that from the joint distribution which occurs, for example, when the data streams of different random variables are collected with different frequencies. We propose a distributionally robust approach, which hedges against a family of joint distributions with fixed marginals and a dependence structure similar to that of a nominal joint distribution, such as an empirical distribution or the independent product distribution. The similarity of the dependence structure is measured through the Wasserstein distance between the copula of the joint distribution and the copula of the nominal distribution. We show that our choice of distance can be used as a new measure of dependence among random variables. Tractability of our new formulation is obtained by a novel constructive proof of strong duality, combining ideas from variational analysis and multi-marginal optimal transport theory.