Statistics and Data Science Seminar
Dr. Caiyan Li
Takeda pharmaceuticals
Statistical Methods for Analysis of Graph constrained Genomic data
Abstract: Graphs and networks are common ways of depicting biological information.
In biology, many different biological processes are represented by graphs,
such as regulatory networks, metabolic pathways and protein protein interaction networks.
This kind of a priori use of graphs is a useful supplement to the standard numerical
data such as microarray gene expression data. In this presentation, we consider the
problem of regression analysis and variable selection when the covariates are linked
on a graph. We study a graph constrained regularization procedure and its theoretical
properties for regression analysis to take into account the neighborhood information of
the variables measured on a graph. This procedure involves a smoothness penalty on the
coefficients that is defined as a quadratic form of the Laplacian matrix
associated with the graph. We establish estimation and model selection
consistency results and provide estimation bounds for both fixed and
diverging numbers of parameters in regression models. We also developed a
second method using Markov Random Field to incorporate the graph information
into analysis of high dimensional data. Finally, we demonstrate by simulations
and a real dataset that the proposed procedure can lead to better variable selection
and prediction than existing methods that ignore the graph information associated
with the covariates.
Wednesday September 11, 2013 at 4:00 PM in SEO 636