MULTI-SCALE INFERENCE OF GENETIC TRAIT ARCHITECTURE USING BIOLOGICALLY ANNOTATED NEURAL NETWORKS

LORIN CRAWFORD – BROWN UNIVERSITY

ABSTRACT

With the emergence of large-scale genomic datasets, there is a unique opportunity to leverage machine learning approaches as standard tools for genome-wide association (GWA) studies. Unfortunately, while machine learning methods have been shown to account for nonlinear data structures and exhibit greater predictive power over classic linear models, these same algorithms have also become criticized as “black box” techniques. Here, we present Biologically Annotated Neural Networks (BANNs), a novel probabilistic framework that makes machine learning fully amenable for GWA applications. BANNs are feedforward models with partially connected architectures that are based on biological annotations. This setup yields a fully interpretable neural network where the input layer encodes SNP-level effects, and the hidden layer models the aggregated effects among SNP-sets. Part of our key innovation is to treat the weights and connections of the network as random variables with prior distributions that reflect how genetic effects manifest at different genomic scales. The BANNs software uses scalable variational inference to provide fully interpretable posterior summaries which allow researchers to simultaneously perform (i) fine-mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. Through simulations, we show that our method improves upon state-of-the-art fine mapping and enrichment approaches across a wide range of genetic architectures. We then further illustrate the benefits of BANNs by analyzing real GWA data assayed in approximately 2,000 heterogenous stock of mice from Wellcome Trust Centre for Human Genetics and approximately 7,000 individuals from the Framingham Heart Study. Lastly, using a subset of individuals of European ancestry from the UK Biobank, we show that BANNs is able to replicate known associations that required functional validation using statistics alone.