PhD Defense by Zhibo Dai

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details
  • Date/Time:
    • Thursday April 16, 2020 - Friday April 17, 2020
      2:00 pm - 2:59 pm
  • Location: REMOTE: BLUE JEANS
  • Phone:
  • URL: BlueJeans Link
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: Spectrum Reconstruction Technique and Improved Naive Bayes Models for Text Classification Problems

Full Summary: No summary paragraph submitted.

I’m Zhibo Dai, the 5th year math PhD student at Georgia Tech in the school of math. I’ll take my defense on 4/16 afternoon between 2pm ET and 3pm ET at Bluejeans meeting 866242745.

 

My thesis title is Spectrum Reconstruction Technique and Improved Naive Bayes Models for Text Classification Problems. The abstract and committee information are as follows:

 

Abstract 
This thesis studies two topics. In the first part, we study the spectrum reconstruction technique. As is known to all, eigenvalues play an important role in many research fields and are foundation to many practical techniques such like PCA (Principal Component Analysis). We believe that related algorithms should perform better with more accurate spectrum estimation. There was an approximation formula proposed by Prof. Matzinger. However, they didn't give any proof. In our research, we show why the formula works. And when both number of features and dimension of space go to infinity, we find the order of error for the approximation formula, which is related to a constant C-the ratio of dimension of space and number of features.

In the second part, we focus on some applications of Naive Bayes models in text classification problems. Especially we focus on two special situations: 1) there is insufficient data for model training; 2) partial label problem. We choose Naive Bayes as our base model and do some improvement on the model to achieve better performance in those two situations. To improve model performance and to utilize as many information as possible, we introduce a correlation factor, which somehow relax the conditional independence assumption of Naive Bayes. The new estimates are biased estimation compared to the traditional Naive Bayes estimate, but have much smaller variance, which give us a better prediction result.

Committee 

  • Prof. Heinrich Matzinger – School of Mathematics (advisor) 
  • Prof. Federico Bonetto– School of Mathematics 
  • Prof. Wenjing Liao – School of Mathematics
  • Prof. Tuo Zhao – School of Industrial and Systems Engineering
  • Prof. Ionel Popescu – School of Mathematics

Related Links

Additional Information

In Campus Calendar
No
Groups

Graduate Studies

Invited Audience
Faculty/Staff, Public, Graduate students, Undergraduate students
Categories
Other/Miscellaneous
Keywords
Phd Defense
Status
  • Created By: Tatianna Richardson
  • Workflow Status: Published
  • Created On: Apr 3, 2020 - 9:36am
  • Last Updated: Apr 3, 2020 - 9:36am