*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
TITLE: Some results in High-dimensional Statistics
ABSTRACT:
High-dimensional statistics is one of the most active research topics in modern statistics. The complexity of data both in size and structure brings new challenges to statisticians to extract useful information apart from the noises in an efficient and accurate manner. The purpose of this thesis is to narrow the gap between theory and practice in high-dimensional statistics by studying some of the widely adopted assumptions in the literature and introducing new testing procedures. To be more specific, it consists of two parts covering $l_1$-regularized estimation for time series and testing for sparse Gaussian graphical model.
The first chapter studied the applications of $l_1$-regularized regression methods for Gaussian vector autoregressive processes. We decomposed the classical regression model into smaller submodels and obtained sparse solutions by applying $l_1$-penalties. We showed that under mild conditions the design matrices corresponding to the submodels are actually generated from some $\alpha$-mixing processes. Therefore, a more general problem is how good is an $l_1$-regularized estimate for a linear model with a random design matrix that is generated by an $\alpha$-mixing Gaussian process with exponential decay rate. Our main result verified the restricted eigenvalue assumption for the mixing random design based on the generic chaining technique and derived the $l_p$ error bound for Lasso and Dantzig selector. We also studied the sufficient conditions for a VAR(p) model to guarantee tight error bound of the solutions. Finally, we illustrated the variable selection and estimation performance of Lasso by several sets of simulation.
In the second chapter, we proposed a new statistic to test the decomposable structure of a Gaussian graphical model in the high-dimensional setting. It is based on the quadratic forms of the sample covariance matrix eigenvalues. In the case when the null hypothesis corresponds to a group independence structure, we derived the asymptotic distribution of the proposed statistic and showed that it is invariant under non-singular linear transformations within each group. When testing an arbitrary decomposable structure, a simple asymptotic distribution of the statistic is not available. We suggested a simulation-based method to approximate the null distribution and calculate the corresponding $p$ value. We also gave some numerical results including both simulation and an empirical example to study the proposed testing procedure in different scenarios.