*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Ph.D. Defense of Dissertation Announcement
Title: Detection of frameshifts and improving genome annotation
Ivan Antonov
School of Computational Science and Engineering
College of Computing Georgia Institute of Technology
Date: Thursday, October 4, 2012
Time: 3:00PM
Location: TBA
Committee:
Abstract:
Analysis of intronless gene sequences available in the public databases revealed that some protein coding regions contain frameshifts, i.e. sudden frame transition from one reading frame to another. Frameshift in a protein coding gene could be due to a sequencing error, an indel mutation or a recoding event (programmed frameshift). Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein coding genes. Identification of genes with frameshifts and revealing their true nature will improve the current genome annotation.
In this dissertation research, we present a new program called GeneTack for ab initio frameshift detection in intronless protein-coding nucleotide sequences. We observed that the frameshift prediction accuracy of GeneTack was higher by a significant margin than the accuracy of two earlier developed programs (FrameD and FSFind). GeneTack was used to screen 1,106 complete prokaryotic genomes and 1,165,799 eukaryotic mRNAs. Genes with predicted frameshifts (fs-genes) were grouped into clusters based on sequence similarity, conservation of predicted frameshift position, and its direction. 5,632 prokaryotic fs-genes from 239 clusters were predicted to be programmed frameshift candidates. Experiments were performed for sequences derived from 20 out of the 239 clusters; programmed ribosomal frameshifting with efficiency higher than 10% was observed for four clusters. Eukaryotic clusters included known programmed frameshift genes and several candidates of dual coding genes.
All the tools and the database of fs-genes are available at the GeneTack web site http://topaz.gatech.edu/GeneTack/