*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Dr. Jingjing Yang
Department of Human Genetics
Emory University
Abstract:
Although genome-wide association studies (GWAS) have identified thousands of SNP-trait associations (>55K reported on GWAS catalog), the biological mechanisms underlying these associations are largely unknown. Here, we propose a Bayesian variable selection model to integrate variant functional annotations and help understand and prioritize causal variants and mechanisms. Our method improves upon previous approaches by accounting for multiple categories of functional annotations, for genotype correlation due to linkage disequilibrium (LD) and, importantly, by quantifying the proportion of causal variants and relative effect sizes of variants with different functional annotation. To apply our model to very large GWAS and sequencing data sets, we present a novel scalable Bayesian computation method through a block-wise expectation maximization Markov Chain Monte Carlo (EM-MCMC) algorithm. Our algorithm dramatically improves both computational speed and posterior sampling convergence by taking advantage of the block-like LD structure of the human genome. In simulations, we show that our method increases power and identifies more true signals compared with competing methods. In real data, we show that previous greedy approaches and MCMC implementations lead to apparently sub-optimal sets of likely causal variants because they fail to fully explore the set of possible causal variants. We applied our method to a genome-wide association study of age-related macular degeneration with ~33 thousand individuals and >12 million genotyped and imputed variants. Our results show that the non-synonymous markers are about 20 times more likely to be causal than the other markers, and that the effect size of associated non-synonymous variants is about 3 times larger than for other variants. Importantly, our method can help prioritize likely functional candidates for follow-up while disentangling the effects of genotype, linkage disequilibrium and functional annotation. Further, we implemented this method using only summary level data from standard GWAS, which saves up to 85% CPU time while producing the same results as using individual-level data. In conclusion, our method has the potential to shed light on the biological mechanism of SNP associations and can help prioritize SNPs for downstream analysis.
Host: Greg Gibson