*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Peter Freddolino, Ph.D.
Department of Biological Chemistry
Department of Computational Medicine and Bioinformatics
University of Michigan
Abstract
Recent advances in high-throughput sequencing technology have yielded a huge increase in our knowledge of genomic sequences, but DNA sequence information remains meaningless without corresponding functional insight. It is only through a synthesis of computational approaches and high-throughput experiments that any meaningful headway can be made in the task of moving from genome sequence information to functional information at the scales of modern biology.We have recently launched two such initiatives, aimed at completely mapping the transcriptional regulatory logic and functional proteome of Escherichia coli. Using a broadly applicable non-specific method for mapping genome-wide protein occupancy, we have begun to identify the binding motifs, functions, and condition-dependent behavior of many cryptic E. coli transcription factors. In the process, we have also identified the presence of heterochromatin-like silenced regions on bacterial chromosomes, which we have found play a key role in regulating stress-response and virulence genes across several bacterial species. To address the problem of assigning functions to poorly annotated proteins without suitably close homologs for sequence-based annotation methods to be effective, we have recently developed a hybrid pipeline combining structural prediction/alignment, sequence alignment, and protein-protein interaction information to obtain combined structure predictions and functional annotations for entire proteomes. We find that our inclusion of structural information makes our workflow unusually strong in performance on difficult targets with limited sequence identity to annotated proteins. Application of our methods at the scale of entire proteomes yields a rich new source of information to seed detailed investigation of the functions of many previously mysterious protein-coding genes.