*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Parallel Algorithms for Enabling Fast and Scalable Analysis of High-throughput Sequencing Datasets
Committee:
Dr. Srinivas Aluru, CoC, Chair , Advisor
Dr. Richard Vuduc, CoC
Dr. Moinuddin Qureshi, ECE
Dr. Linda Wills, ECE
Dr. Ada Gavrilovska, CoC
Abstract:
The objective of this research is to develop parallel algorithms for enabling fast and scalable analysis of large-scale high-throughput sequencing datasets. Genome of an organism consists of one or more long DNA sequences called chromosomes, each a sequence of bases. Depending on the organism, the length of the genome can vary from several thousand bases to several billion bases. Genome sequencing, which involves deciphering the sequence of bases of the genome, is an important tool in genomics research. Sequencing instruments in vogue today can only read short DNA sequences. However, these instruments can read billions of such sequences at a time, and are used to sequence a large number of randomly generated short genomic fragments from the genome. These fragments are a few hundred bases long and are commonly referred to as “reads”. This work specifically tackles three problems associated with high-throughput sequencing datasets: (1) Parallel read error correction for large-scale genomics datasets, (2) Partitioning of large-scale high-throughput sequencing datasets, and (3) Parallel compression of large-scale genomics datasets.