Statistics and Data Science Seminar
Jun Xie
Purdue University
High dimensional classification and its application in pharmacogenomics research
Abstract: Many statistical classification methods, e.g., Fisher's linear
discriminant analysis, cannot be directly applied to high dimensional
data, where the number of variables is larger than the sample size. While
high dimensional data analysis has been broadly discussed in statistics
community, the impact of dimensionality on classifications is poorly
understood. We examine and compare high dimensional classification
methods through an application in pharmacogenomics research, where
high-dimensional gene expression microarray data are used to predict
patients' responses to a drug. Compared with most gene expression
classification studies to detect strong signals, for instance tumor
versus normal, a classifier between patients' response and non-response
is more challenging and may be nonlinear. We introduce several new
classification methods, including a sparse linear discriminant method,
random projection, and a distribution based classification involving
second-order interactions, as potential tools to deal with high
dimensionality. We also want to call attentions to theories of high
dimensional classification, where there are only few results available.
Wednesday October 27, 2010 at 3:00 PM in SEO 636