Statistics and Data Science Seminar
Dr. Mahtab Munshi
Takeda Global Research and Development Center
Impact of Missing Data on Building Prognostic Models and Summarizing Models Across Studies
Abstract: We examine the impact of missing data in two settings, the development of
prognostic models and the addition of new risk factors to existing risk
functions. Most statistical software presently available performs complete case
analysis, wherein only participants with known values for all of the
characteristics being analyzed are included in model development. Missing data
also impacts the summarization of evidence amongst multiple studies using
meta-analytic techniques. As we progress in medical research, new covariates
become available for studying various outcomes. While we want to investigate the
influence of new factors on the outcome, we also do not want to discard the
historical datasets that do not have information about these markers. We
investigate different methods to estimate parameters for a model when some of
the covariates are missing. These methods include likelihood-based inference for
the study-level coefficients and likelihood based inference for the logistic
model on the person-level data. We compare the results from our methods to the
corresponding results from complete case analysis. We focus our empirical
investigation on a historical example, the addition of high-density lipoproteins
to existing equations for predicting death due to coronary heart disease. We
verify our methods through simulation studies on this example.
There is a Tea at 3:15pm.
Wednesday March 8, 2006 at 3:30 PM in SEO 512