UPS Delivers Optimal Phase Diagram for High Dimensional Variable Selection

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details

Date/Time:
- Thursday April 7, 2011 - Friday April 8, 2011
  11:00 am - 11:59 am
Location: ISyE Executive classroom
Phone:
URL:
Email:
Fee(s):
N/A
Extras:

Contact

No contact information submitted.

Summaries

Summary Sentence: UPS Delivers Optimal Phase Diagram for High Dimensional Variable Selection

Full Summary: No summary paragraph submitted.

TITLE: UPS Delivers Optimal Phase Diagram for High Dimensional Variable Selection

SPEAKER: Jiashun Jin

ABSTRACT:

Consider a linear regression model \begin{equation*}
Y = X \beta + z, \qquad z \sim N(0, I_n),   \qquad X = X_{n, p},
\end{equation*} where both $p$ and $n$ are large but $p > n$.   The vector $\beta$ is unknown but is sparse in the sense that only a small proportion of   its coordinates is nonzero, and we are interested in identifying these nonzero ones. We model the coordinates of $\beta$ as samples from a two-component mixture $(1 - \eps) \nu_0 + \eps {\pi}$,   and the rows of $X$ as samples from $N(0, \frac{1}{n}\Omega)$, where $\nu_0$ is the point mass at $0$, $\pi$ is a distribution, and $\Omega$ is a $p$ by $p$ correlation matrix which is unknown but is presumably sparse.

We propose a two-stage variable selection procedure which we call the {\it UPS}.   This is a Screen and Clean procedure, in which   we screen with the Univariate thresholding, and clean with the Penalized MLE. In many situations, the UPS possesses two important properties: Sure Screening and Separable After Screening (SAS). These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. As a result, the UPS is effective both in theory and in computation.

We measure the performance of variable selection procedure by the Hamming distance, and use an asymptotic framework where $p \goto \infty$ and $(\eps, \pi, n, \Omega)$ depend on $p$. We find that in many situations, the UPS achieves the optimal rate of convergence.
We also find that in the $(\eps_p, \pi_p)$ space, there is a three-phase diagram shared    by many choices of $\Omega$. In the first phase, it is possible to   recover all   signals. In the second phase, exact recovery is impossible, but it is possible to recover most of the signals.
In the third phase, successful variable selection is impossible. The UPS partitions the phase space in the same way that the optimal procedures do, and recovers most of the signals
as long as successful variable selection is possible.

The lasso and the subset selection (also known as the $L^1$- and $L^0$-penalization methods, respectively) are well-known approaches to variable selection. However,
somewhat surprisingly, there are regions in the phase space where neither the lasso nor the subset selection is rate optimal, even for very simple $\Omega$. The lasso is non-optimal because it is too loose in filtering out fake signals (i.e. noise that is highly correlated with a signal), and the subset selection is non-optimal because it tends to kill one or more signals   in correlated pairs, triplets, etc..

Additional Information

In Campus Calendar

Groups

School of Industrial and Systems Engineering (ISYE)

Invited Audience

No audiences were selected.

Categories

No categories were selected.

Keywords

No keywords were submitted.

Status

Created By: Anita Race
Workflow Status: Published
Created On: Mar 24, 2011 - 10:27am
Last Updated: Oct 7, 2016 - 9:54pm

Georgia Tech

UPS Delivers Optimal Phase Diagram for High Dimensional Variable Selection

Additional Information