*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
In partial fulfillment of the Requirements for the Degree of
Master of Science in Biology
in the
School of Biology
Juan Camilo Castro Gordillo
will defend his thesis
“Finding the needle in the haystack: Developing tools for genome detection in metagenomic datasets”
Wednesday, August 16, 2017
12:00 PM
School of Biomedical Engineering (Whitaker Building), Room 1232
Thesis Advisor
Dr. Kostas Konstantinidis
School of Civil and Environmental Engineering
Committee Members
Dr. I. King Jordan (Biological Sciences)
Dr. Frank Stewart (Biological Sciences)
ABSTRACT
Accurate detection of target microbial species in metagenomic datasets from environmental samples remains limited, because the limit of detection of current methods is typically inaccessible and the frequency of false-positives, resulting from inadequate identification of regions of the genome that are either too highly conserved to be diagnostic (e.g., rRNA genes) or prone to frequent horizontal genetic exchange (e.g., mobile elements) remains unknown.
Our framework, called imGLAD, is based on mapping reads against a reference genome and subsequently calculating the likelihood that the genome is present based on logistic feature classification. imGLAD achieves high accuracy because it uses the sequence-discrete population concept for discriminating between metagenomic reads originating from the target organism compared to reads from co-occurring close relatives, masks regions of the genome that are not informative using the MyTaxa engine, and models both the sequencing breadth and depth to determine relative abundance and limit of detection. We validated imGLAD by analyzing metagenomic datasets derived from spinach leafs inoculated with the enteric pathogen Escherichia coli O157:H7 and showed that its limit of detection is comparable to that of PCR-based approaches ~1 cell/gram.