PhD Defense by Haowen Zhang

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details
  • Date/Time:
    • Thursday June 30, 2022
      3:00 pm - 5:00 pm
  • Location: ZOOM
  • Phone:
  • URL: ZOOM
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: Efficient Methods for Read Mapping

Full Summary: No summary paragraph submitted.

Title: Efficient Methods for Read Mapping

Date: Thursday, June 30th, 2022

Time: 3pm - 5pm ET

Location: https://gatech.zoom.us/j/97605744790 


Haowen Zhang
School of Computational Science and Engineering
College of Computing
Georgia Institute of Technology

 

Committee:

Dr. Srinivas Aluru (Advisor, School of Computational Science and Engineering, Georgia Institute of Technology)
Dr. Ümit V. Çatalyürek (School of Computational Science and Engineering, Georgia Institute of Technology)

Dr. Kostas Konstantinidis (School of Civil and Environmental Engineering, Georgia Institute of Technology)
Dr. Xiuwei Zhang (School of Computational Science and Engineering, Georgia Institute of Technology)

Dr. Heng Li (Department of Biomedical Informatics, Harvard Medical School)

 

------------------------ 

 

Abstract:

DNA sequencing is the mainstay of biological and medical research. Modern sequencing machines can read millions of DNA fragments, sampling the underlying genomes at high-throughput. Mapping the resulting reads to a reference genome is typically the first step in sequencing data analysis. The problem has many variants as the reads can be short or long with a low or high error rate for different sequencing technologies, and the reference can be a single genome or a graph representation of multiple genomes. Therefore, it is crucial to develop efficient computational methods for these different problem classes. Moreover, continually declining sequencing costs and increasing throughput pose challenges to the previously developed methods and tools that cannot handle the growing volume of sequencing data.

 

This dissertation seeks to advance the state-of-the-art in the established field of read mapping by proposing more efficient and scalable read mapping methods as well as tackling emerging new problem areas. Specifically, we design ultra-fast methods to map two types of reads: short reads for high-throughput chromatin profiling and nanopore raw reads for targeted sequencing in real-time. In tune with the characteristics of these types of reads, our methods can scale to larger sequencing data sets or map more reads correctly compared with the state-of-the-art mapping software. Furthermore, we propose two algorithms for aligning sequences to graphs, which is the foundation of mapping reads to graph-based reference genomes. One algorithm improves the time complexity of existing sequence to graph alignment algorithms for linear or affine gap penalty. The other algorithm provides good empirical performance in the case of edit distance metric. Finally, we mathematically formulate the problem of validating paired-end read constraints when mapping sequences to graphs, and propose an exact algorithm that is also fast enough for practical use.

Additional Information

In Campus Calendar
No
Groups

Graduate Studies

Invited Audience
Faculty/Staff, Public, Undergraduate students
Categories
Other/Miscellaneous
Keywords
Phd Defense
Status
  • Created By: Tatianna Richardson
  • Workflow Status: Published
  • Created On: Jun 15, 2022 - 3:46pm
  • Last Updated: Jun 15, 2022 - 3:46pm