Search the web
Sign In
New User? Sign Up
recursive-partitioning · Recursive Partitioning
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
[RP] Can you trust the splits produced by classification trees?   Message List  
Reply | Forward Message #60 of 95 |
The answer may be NO! I'm going to discuss the case where all attributes
are categorical. Exhaustive search algorithm (described in Breiman,
Friedman, Olshen & Stone, 1984) tends to select categorical attribute with
many levels as the split variable. For a categorical attribute with c
levels, you need to evaluate up to 2^{c-1} - 1 possible splits. So, the
more levels the attribute has, the more likely the attribute is selected as
the split variable just by chance.

On the other hand, CHAID and its derivatives (Kass, 1980; Hawkins & Kass,
1982; Biggs, de Ville & Suen, 1991) tend to select categorical attribute
with few levels. The algorithm penalizes categorical attributes with many
levels too severely. The adjustment proposed by Biggs, et al. (1991) seems
to be the least conservative, however.

QUEST, CRUISE, and PLUS also tend to select categorical attribute with few
levels when all categorical attributes are "equally informative" with
respect to the dependent variable. This is an artifact of the Pearson's
chi-square test for independence in a 2-way contingency table.

Hence, users of classification tree methods should exercise caution in
interpreting the resulting tree diagram when the categorical attributes
have varying levels. The selection bias won't occur when all categorical
attributes have the same number of levels. There won't be any serious bias
when all attributes are numerical and they have roughly comparable numbers
of distinct values.

The case of mixed attributes (numerical and categorical) is more
complicated and I haven't studied it deeply. My preliminary simulation
results (not for citation yet) can be downloaded from

http://www.recursive-partitioning.com/plus/split.pdf

Thank your for your attention. I'd welcome any discussion/comment.

--
Tjen-Sien Lim
tslim@...
www.Recursive-Partitioning.com
______________________________________________________________________
Get paid to write a review! http://recursive-partitioning.epinions.com





Mon Jan 10, 2000 5:18 am

tslim@...
Send Email Send Email

Forward
Message #60 of 95 |
Expand Messages Author Sort by Date

The answer may be NO! I'm going to discuss the case where all attributes are categorical. Exhaustive search algorithm (described in Breiman, Friedman, Olshen &...
Tjen-Sien Lim
tslim@...
Send Email
Jan 10, 2000
5:17 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help