PhD Defense by Haowen Zhang

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details

Date/Time:
- Thursday June 30, 2022
  3:00 pm - 5:00 pm
Location: ZOOM
Phone:
URL: ZOOM
Email:
Fee(s):
N/A
Extras:

Contact

No contact information submitted.

Summaries

Summary Sentence: Efficient Methods for Read Mapping

Full Summary: No summary paragraph submitted.

Title: Efficient Methods for Read Mapping

Date: Thursday, June 30th, 2022

Time: 3pm - 5pm ET

Location: https://gatech.zoom.us/j/97605744790

Haowen Zhang
School of Computational Science and Engineering
College of Computing
Georgia Institute of Technology

Committee:

Dr. Srinivas Aluru (Advisor, School of Computational Science and Engineering, Georgia Institute of Technology)
Dr. Ümit V. Çatalyürek (School of Computational Science and Engineering, Georgia Institute of Technology)

Dr. Kostas Konstantinidis (School of Civil and Environmental Engineering, Georgia Institute of Technology)
Dr. Xiuwei Zhang (School of Computational Science and Engineering, Georgia Institute of Technology)

Dr. Heng Li (Department of Biomedical Informatics, Harvard Medical School)

------------------------

Abstract:

DNA sequencing is the mainstay of biological and medical research. Modern sequencing machines can read millions of DNA fragments, sampling the underlying genomes at high-throughput. Mapping the resulting reads to a reference genome is typically the first step in sequencing data analysis. The problem has many variants as the reads can be short or long with a low or high error rate for different sequencing technologies, and the reference can be a single genome or a graph representation of multiple genomes. Therefore, it is crucial to develop efficient computational methods for these different problem classes. Moreover, continually declining sequencing costs and increasing throughput pose challenges to the previously developed methods and tools that cannot handle the growing volume of sequencing data.

This dissertation seeks to advance the state-of-the-art in the established field of read mapping by proposing more efficient and scalable read mapping methods as well as tackling emerging new problem areas. Specifically, we design ultra-fast methods to map two types of reads: short reads for high-throughput chromatin profiling and nanopore raw reads for targeted sequencing in real-time. In tune with the characteristics of these types of reads, our methods can scale to larger sequencing data sets or map more reads correctly compared with the state-of-the-art mapping software. Furthermore, we propose two algorithms for aligning sequences to graphs, which is the foundation of mapping reads to graph-based reference genomes. One algorithm improves the time complexity of existing sequence to graph alignment algorithms for linear or affine gap penalty. The other algorithm provides good empirical performance in the case of edit distance metric. Finally, we mathematically formulate the problem of validating paired-end read constraints when mapping sequences to graphs, and propose an exact algorithm that is also fast enough for practical use.

Additional Information

In Campus Calendar

Groups

Graduate Studies

Invited Audience

Faculty/Staff, Public, Undergraduate students

Categories

Other/Miscellaneous

Keywords

Phd Defense

Status

Created By: Tatianna Richardson
Workflow Status: Published
Created On: Jun 15, 2022 - 3:46pm
Last Updated: Jun 15, 2022 - 3:46pm

Georgia Tech

PhD Defense by Haowen Zhang

Additional Information