Statistics and Data Science Seminar

George Karabatsos
UIC
New Approaches to Fast Approximate Bayesian Nonparametric Inference
Abstract: Dirichlet process (DP) mixture models, as well as models with mixture distribution assigned a general Bayesian nonparametric (BNP) prior distribution on the space of probability measures, are widely-applied and flexible models that can provide reliable statistical inferences complex data. For such Bayesian mixture models, in practice, posterior inferences are usually conducted using MCMC, which however, is prohibitively slow for large data sets. Also for such models, prior specification can be non-trivial in practice. As alternatives to MCMC, I consider two new approaches to fast and approximate BNP inference for large data sets. First, I show that if the ordinary least-squares (OLS) estimator of the linear regression coefficients is specified as a functional of the DP posterior distribution, then this functional has posterior mean given by an observation-weighted ridge regression estimator, with ridge (coefficient shrinkage) parameter given by the DP precision parameter; and has a heteroscedastic-consistent posterior covariance matrix. This result is based on the multivariate delta method applied to prior-informed bootstrap distribution approximation to the DP posterior. Second, I consider an approximation to the BNP (infinite) mixture model that I introduced and studied in several articles, defined by ordinal regression mixture weights.The approximate model is defined by a (large) finite mixture, with each component distribution multiplied by a histogram bin indicator function. I show that posterior inference with this approximate BNP model can be conducted by iteratively-reweighted least squares estimation for the mixture weight parameters, and least-squares estimation for the component densities, all involving computations that are orders of magnitude faster that MCMC-based inference of the original mixture model. This is also true for a version of the approximate model that is defined by an ordinal regression of DPs. I illustrate the two approximate BNP methods through the analysis of real data sets.
TBA
Wednesday October 5, 2016 at 4:00 PM in SEO 636
Web Privacy Notice HTML 5 CSS FAE
UIC LAS MSCS > persisting_utilities > seminars >