Statistics and Data Science Seminar

Professor Kin-Yee Chan
National University of Singapore
Logistic Regression Trees
Abstract: Logistic regression is a powerful technique for fitting models to data with a binary response variable, but the models are difficult to interpret if collinearity, nonlinearity, or interactions are present. Besides, it is hard to judge model adequacy since there are few diagnostics for choosing variable transformations and no true goodness-of-fit test. To overcome these problems, we propose to fit a piecewise (simple,multiple or stepwise) linear logistic regression model by recursively partitioning the data and fitting a different logistic regression in each partition. This allows nonlinear features of the data to be modeled without requiring variable transformations. Trend-adjusted chi-square tests are used to control bias in variable selection at the intermediate nodes. This protects the integrity of inferences drawn from the tree structure. The binary tree that results from the partitioning process is pruned to minimize a cross-validation estimate of the predicted deviance. This obviates the need for a formal goodness-of-fit test. Our algorithm, called "LOTUS", is compared with standard stepwise logistic regression and two well-known classification tree algorithms (QUEST and C4.5) on 13 real datasets, with several containing tens to hundreds of thousands of observations. Results will be presented at this talk.
Wednesday January 18, 2006 at 3:30 PM in SEO 512
Web Privacy Notice HTML 5 CSS FAE
UIC LAS MSCS > persisting_utilities > seminars >