I just uploaded a new version of the site,
including version 1.0.7 of LingPipe:
http://www.aliasi.com/lingpipe
It addresses some issues raised on this mailing
list and/or sent directly to our help email address.
Thanks for all the comments.
1.0.7 patches some bugs and also adds some features
necessary to allow dictionaries used for training
the estimator to play nicely with the tag model for
decoding. I fixed the bug that wasn't adding the
dictionary-trained tags to the tag symbol table.
I then added a smoothing parameter that is a count
added to all estimators for P(Tag2|Tag1) where
Tag1,Tag2 is a legal sequence (basically, Tag2 isn't
a continuation of a category that doesn't match Tag1).
I then fixed the coref behavior to treat categories
other than those specified in MUC (PERSON, ORGANIZATION
and LOCATION) to have a very generic matching behavior
rather than throwing exceptions.
Please let me know if you have comments, suggestions
or questions. I appreciate all the feedback I can get.
I'll close with some teasers about our future plans.
LingPipe 2.0 is in the coding stage, and will contain
(a) part-of-speech tagging, (b) various n-gram language models,
(c) a language-model based classifier. Time frame for 2.0
is before the end of 2004.
At the whiteboard stage, our two-year plan (by end of 2006)
is to (a) to incorporate n-best and lattice-based ambiguity
packing along with confidence evaluation, (b) full parsing
with confidence, and (c) within and cross-document coref
by statistical clustering.
- Bob Carpenter
carp@...