*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Algorithmic Techniques for Variant Selection in Genome Graphs
Date: Wednesday, September 29th, 2021
Time: 8:00am – 10:00am (EDT)
Location: https://bluejeans.com/210080584/2788
Neda Tavakoli
Ph.D. Student, Computer Science
School of Computational Science and Engineering
College of Computing
Georgia Institute of Technology
Committee
———————
Dr. Srinivas Aluru (Advisor, School of Computational Science and Engineering, Georgia Tech)
Dr. Ümit V. Çatalyürek (School of Computational Science and Engineering, Georgia Tech)
Dr. Richard W. Vuduc (School of Computational Science and Engineering, Georgia Tech)
Dr. Tobin Isaac (School of Computational Science and Engineering, Georgia Tech)
Dr. Constantine Dovrolis (School of Computer Science, Georgia Tech)
Dr. Arkadi Nemirovski (School of Industrial and System Engineering Georgia Tech)
Abstract
———————
Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. This dissertation research takes a holistic approach to design algorithmic techniques for variant selection in genome graphs with three original contributions.
First, we develop a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most Δ differences. This framework leads to a rich set of problems based on the types of variants (e.g., SNPs, indels, or structural variants), and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate run-time performance and reduction in variation graph sizes achieved by the multiple algorithms that are proposed in this dissertation research.
Second, we establish benchmark data sets and tools to empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple and parameter values corresponding to short and long-read resequencing characteristics. The graph size reduction can benefit downstream pan-genome analysis.
Third, we propose a novel mathematical model to extend our algorithmic framework with respect to disease-related variants (mainly SNPs), as those variants contribute to common, and complex diseases, such as cancers, diabetes mellitus, asthma, cardiovascular disease, and mental illnesses. The proposed framework first filters out irrelevant variants while preserving disease markers, then identifies optimal variants subject to preserving paths of length α while allowing at most Δ differences. We separately consider the problems of minimizing the number of positions at which variants are retained and minimizing the total number of variants selected. The proposed framework has the capability of early identification of those individuals who are at risk of developing certain diseases. By routine screening for certain diseases, we can diagnose them when the patient is in the subclinical (asymptomatic) stage and has not yet developed clinical symptoms. The earlier the disease is diagnosed and treated, the better outcome is usually achieved, including a higher success rate in the treatment of disease, lower chance of the need for hospitalization, and eventually less financial burden on the health system. In addition, the ultimate goal of our mathematical model is to provide targeted gene therapy in the case of lethal diseases such as aggressive cancers. In summary, our proposed framework can potentially help to detect, treat and prevent specific diseases at very early stages and to achieve the magnitude of graph reduction.