Statistics and Data Science Seminar
Wei Sun
UIC
Cluster analysis of high-dimensional data via regularized K-means
Abstract: K-means clustering is a widely used tool for cluster analysis due to its conceptual
simplicity and computational efficiency. However, its performance can be distorted
when clustering high-dimensional data where the number of variables becomes relatively large and many of them may contain no information about the clustering structure. In this talk we will discuss a novel high-dimensional cluster analysis method via regularized k-means clustering, which can simultaneously cluster similar observations and eliminate redundant variables. The key idea is to formulate the k-means clustering in a form of regularization, with an adaptive group lasso penalty term on cluster centers. Then we will talk about the selection criterion based on clustering stability to optimally balance the trade-off between the clustering model fitting and sparsity. The effectiveness of the proposed method is demonstrated through a variety of numerical experiments as well as applications to two gene microarray examples.
Wednesday February 9, 2011 at 3:00 PM in SEO 636