*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Improvements in the Modeling of High Dimension/Low Sample Size Imbalanced Clinical Data Sets
Committee:
Dr. Kamaleswaran, Advisor
Dr. Inan, Co-Advisor
Dr. Anderson, Chair
Dr. Grunwell
Abstract: The objective of the proposed research is to reduce the deficiency in analysis of clini- cal datasets. The clinical research datasets suffer from high expense at the time of sample collection and processing. This forces the researchers to not take and analyze large number of samples at each trial. This fact combined with high dimensionality of features fre- quently observed in these datasets, make the analysis of clinical datasets very challenging. Although methods exist to help overcoming this problem and make these datasets more applicable for modeling, the label imbalance still has a very huge impact on the outcome of any model developed using these datasets. To give an example, in most of the clinical re- search, most of the patients lie towards surviving or not surviving, meaning that the model developed using this dataset is highly possible to be predicting with bias. The models being trained using this population are more vulnerable to errors because of bias imposed by the majority population. Therefore, the minority population is being exposed to false prediction. This minority population may vary by choosing different labels in the analysis making it a more important problem to be solved. This research aims to propose solutions on how to reduce this impact in clinical re- search. To do this, verified datasets are collected and processed, the previous researches have been studied, and new approaches like SMOGN have being tried. The promising pre- liminary results are presented at the end of this proposal to make the case for answering this problem in the imbalanced small sample rate with high dimensionality clinical datasets.