COUNT STATISTICS FOR INFERRING GENE-GENE INTERACTIONS AND DISEASE-DRUG ASSOCIATIONS
HAIYAN HUANG – UNIVERSITY OF CALIFORNIA, BERKELEY
ABSTRACT
With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the “big data” challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. In this talk, I will introduce several gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. In an ongoing project, we develop another count-based statistic to assess “reverse” correlations, which is useful for drug discoveries. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.
RELEVANT PAPERS
1. Wang YR, Waterman MS, Huang H. Gene coexpression measures in large heterogeneous samples using count statistics. Proceedings of the National Academy of Sciences. 2014 Nov 18;111(46):16371-6.
2. Wang YR, Liu K, Theusch E, Rotter JI, Medina MW, Waterman MS, Huang H. Generalized correlation measure using count statistics for gene expression data with ordered samples. Bioinformatics. 2018 Feb 15;34(4):617-24.