Statistics and Data Science Seminar
Ming-Chung Chang
Academia Sinica
Supervised Stratified Subsampling: an Approach to Big Data Predictive Analytics
Abstract: Predictive analytics encompasses the use of statistical models for prediction. Its power, however, is hindered by the rising amounts of data in recent years. Owing to advanced technology, big data are ubiquitous across disciplines. Such data richness may yield difficulties in predictive analytics either in terms of time cost or numerical stability. In this talk, I will introduce a new subsampling approach to overcome this difficulty for regression problems. The proposed method integrates a nonparametric regression technique and stratified sampling, referred to as supervised stratified subsampling. Theoretical properties are developed to justify this method. Numerical studies show that the proposed method yields good predictions and is against model misspecification.
Wednesday March 15, 2023 at 6:00 PM in Zoom