>
> When dealing with proprietary methodology, it's (practically)
> impossible to study the properties of the method thoroughly.
Personally, I
feel uncomfortable using a method that can't
> be evaluated objectively by fellow researchers. It may be OK if the
> application has nothing to do with human experimentation (as in
> Biostatistics). Since most (if not all) applications of Data Mining
> are in commerce, the risk of using unproven methodology that hasn't
> been extensively scrutinized may be acceptable.
You aren't the only one. As an archaeologist who extensively uses
statistics,
it really helps to know the assumptions and how the algorithm is
working.
As an example, Mike Baxter from the Nottingham-Trent University has
some
forthcoming papers (currently available by snail mail as departmental
technical reports) on the use of model-based clustering in analyzing
geochemical data from pottery. Very different results occur
depending on
the program one uses (in this case EMMIX and MCLUST) due to the
implementation of the EM algorithm and the maximization criteria.
So, combine that with the fact a lot of our data is non-normal to
begin
with, and its no wonder most archaeologists these days avoid anything
with
math, statistics, etc.
Sorry these comments have little to do with RP per se, but it is a
very
similiar situation though.
Later, Mark Hall