Statistics and Data Science Seminar
Hani Aldirawi
University of Illinois at Chicago
Identifying Appropriate Probabilistic Models for Sparse Discrete Data
Abstract: Modeling sparse and discrete data such as microbiome and insurance claim data is challenging due to the exceeded number of zeros. Many probabilistic models have been used for modeling sparse data, including Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial models. We propose a statistical procedure for identifying the most appropriate discrete probabilistic models for zero-inflated or Hurdle models based on the p-value of the discrete Kolmogorov-Smirnov (KS) test when the population parameters are unknown. We develop a general procedure for estimating the parameters for a large class of zero-inflated models and Hurdle models. We also develop a general likelihood ratio test based on Neyman-Pearson lemma for choosing the best model when appropriate ones are more than one.
Wednesday March 13, 2019 at 4:00 PM in 636 SEO