*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Thesis Title: New Progress in Hot-spots Detection in Spatial-temporal Data, Partial-differential-equation-based Model Identification, and Statistical Computing
Advisors:
Dr. Yajun Mei, School of Industrial and Systems Engineering, Georgia Tech
Dr. Xiaoming Huo, School of Industrial and Systems Engineering, Georgia Tech
Committee members:
Dr. Jianjun Shi, School of Industrial and Systems Engineering, Georgia Tech
Dr. Haomin Zhou, School of Mathematics, Georgia Tech
Dr. Sarah E. Holte, Fred Hutchinson Cancer Research Center
Date and Time: 1:00 pm (EST), Friday, April 9th, 2021
Meeting URL: https://bluejeans.com/101334221
Meeting ID: 101 334 221 (BlueJeans)
Abstract:
This thesis contributes to sparse identification problem in the spatio-temporal data and its computations. Our study helps (1) hot-spots detection among multivariate spatio-temporal data, (2) identifications in partial differential equations (PDE), and (3) optimization in the Least Absolute Shrinkage and Selection Operator (Lasso) type problem. And we have four main works.
In Chapter 1, we aim at sparse hot-spots detection in multivariate spatio-temporal data that are non-stationary over time. In this chapter, we propose an efficient statistical method to detect hot-spots through tensor decomposition, and our method has three steps. First, we fit the observed data into three components: smooth global mean, sparse local anomalies, and random noises. Next, we estimate the parameters by a combination of Lasso and fused Lasso to address the spatial sparsity and temporal consistency. Finally, we apply a Cumulative Sum (CUSUM) Control Chart to monitor the model residuals, which allows us to detect when and where the hot-spot events occur. To demonstrate the usefulness of our proposed method, we compare it with several other methods in extensive numerical simulation studies and a real crime rate dataset.
In Chapter 2, we improve the methodology in Chapter 1 in two aspects. First, we propose an more computationally efficient algorithm to realize sparse hot-spots detection among high-dimensional spatio-temporal data. Second, we focus on detecting hot-spots with temporal circularity, instead of temporal continuity as in Chapter 1. This helps us handle many bio-surveillance and healthcare applications, where data sources are measured from many spatial locations repeatedly over time, say, daily/weekly/monthly. The usefulness of our proposed methodology is validated through numerical simulation and a real-world dataset in the weekly number of gonorrhea cases from 2006 to 2018 for 50 states in the United States.
In Chapter 3, we propose a two-stage method called Spline Assisted Partial Differential Equation involved Model Identification (SAPDEMI) to efficiently identify the underlying partial differential equation (PDE) models from the noisy data. In the first stage -- functional estimation stage -- we employ the cubic spline to estimate the unobservable derivatives, which serve as candidates included in the underlying PDE models. In the second stage -- model identification stage-- we apply Lasso to identify the underlying PDE model. The contributions of our proposed SAPDEMI method are: (1) it is computationally efficient in the functional estimation stage because it achieves the lowest possible order of complexity, (2) we focus on the model selections in the model identification stage, while the existing literature mostly focus on parameter estimations, (3) we develop statistical properties of our method for correct identification.
In Chapter 4, we focus on developing an algorithm to solve optimization in the Lasso-type problem, whose objective function is not strictly convex when the number of features is less than the number of samples. To handle this non-strict convexity, we use a homotopic method, i.e., use a sequence of surrogate functions to approximate the L1 penalty in the Lasso-type problem. The surrogate functions will converge to the L1 penalty in the Lasso estimator. At the same time, each surrogate function is strictly convex, which enables a provable faster numerical rate of convergence. In this chapter, we demonstrate that by meticulously defining the surrogate functions, one can prove a faster numerical convergence rate than any existing methods in computing for the Lasso-type of estimators.