Graduate Statistics Seminar
Keren Li
Northwestern University
Score Matching Representative Approach
Abstract: We propose a fast and efficient strategy, called the representative approach,
for big data analysis with linear models and generalized linear models. With a
given partition of big dataset, this approach constructs a representative data point
for each data block and fits the target model using the representative dataset. In
terms of time complexity, it is as fast as the subsampling approaches in the literature. As for efficiency, its accuracy in estimating parameters is better than the
divide-and-conquer method. With comprehensive simulation studies and theoretical justifications, we recommend two representative approaches. For linear models
or generalized linear models with a flat inverse link function and moderate coefficients of continuous variables, we recommend mean representatives (MR). For other
cases, we recommend score-matching representatives (SMR). As an illustrative application to the Airline on-time performance data, MR and SMR are as good as the
full data estimate when available. Furthermore, the proposed representative strategy is ideal for analyzing massive data dispersed over a network of interconnected
computers
Wednesday April 22, 2020 at 3:00 PM in Zoom