*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
I’m Zhibo Dai, the 5th year math PhD student at Georgia Tech in the school of math. I’ll take my defense on 4/16 afternoon between 2pm ET and 3pm ET at Bluejeans meeting 866242745.
My thesis title is Spectrum Reconstruction Technique and Improved Naive Bayes Models for Text Classification Problems. The abstract and committee information are as follows:
Abstract
This thesis studies two topics. In the first part, we study the spectrum reconstruction technique. As is known to all, eigenvalues play an important role in many research fields and are foundation to many practical techniques such like PCA (Principal Component Analysis). We believe that related algorithms should perform better with more accurate spectrum estimation. There was an approximation formula proposed by Prof. Matzinger. However, they didn't give any proof. In our research, we show why the formula works. And when both number of features and dimension of space go to infinity, we find the order of error for the approximation formula, which is related to a constant C-the ratio of dimension of space and number of features.
In the second part, we focus on some applications of Naive Bayes models in text classification problems. Especially we focus on two special situations: 1) there is insufficient data for model training; 2) partial label problem. We choose Naive Bayes as our base model and do some improvement on the model to achieve better performance in those two situations. To improve model performance and to utilize as many information as possible, we introduce a correlation factor, which somehow relax the conditional independence assumption of Naive Bayes. The new estimates are biased estimation compared to the traditional Naive Bayes estimate, but have much smaller variance, which give us a better prediction result.
Committee