Statistics and Data Science Seminar
Prof. Peter Qian
University of Wisconsin-Madison
Demystifying Gaussian Process Models with Massive Data
Abstract: Gaussian process (GP) models are widely used in statistics, optimization, machine learning and other fields. Fitting a GP model with massive data is not only a challenge but also a mystery. On one hand, the nominal accuracy of a GP model is supposed to increase with the number of data points. On the other hand, fitting such a model to a large number of points encounters numerical singularity. To reconcile this contradiction, I will present a method to achieve both numerical stability and theoretical accuracy in fitting a massive GP model. This method obtains nested subsamples of the data, builds submodels for different subsets and then combines these models together to form an accurate prediction model. A decomposition of the overall model error into nominal and numeric portions is introduced to shed light on the theoretical underpinnings of the method. Bounds on the numeric and nominal error are developed to show that substantial gains in overall accuracy can be attained with this sequential method. Efficient algorithms are introduced to generate the required nested subsamples of the developed method.
Wednesday April 10, 2013 at 4:00 PM in SEO 636