Statistics and Data Science Seminar
Professor Kin-Yee Chan
National University of Singapore
Logistic Regression Trees
Abstract: Logistic regression is a powerful technique for fitting models to data with a
binary response variable, but the models are difficult to interpret if
collinearity, nonlinearity, or interactions are present. Besides, it is hard to
judge model adequacy since there are few diagnostics for choosing variable
transformations and no true goodness-of-fit test. To overcome these problems,
we propose to fit a piecewise (simple,multiple or stepwise) linear logistic
regression model by recursively partitioning the data and fitting a different
logistic regression in each partition. This allows nonlinear
features of the data to be modeled without requiring variable transformations.
Trend-adjusted chi-square tests are used to control bias in variable selection
at the intermediate nodes. This protects the integrity of inferences drawn from
the tree structure. The binary tree that results from the partitioning process
is pruned to minimize a cross-validation estimate of the predicted deviance.
This obviates the need for a formal goodness-of-fit test. Our algorithm, called
"LOTUS", is compared with standard stepwise logistic regression and two
well-known classification tree algorithms (QUEST and C4.5) on 13 real
datasets, with several containing tens to hundreds of thousands of observations.
Results will be presented at this talk.
Wednesday January 18, 2006 at 3:30 PM in SEO 512