Search the web
Sign In
New User? Sign Up
LingPipe
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
training and run-time dictionaries   Message List  
Reply | Forward Message #56 of 797 |
I believe this pertains to a number of messages that
arrived when I was on an extended vacation. This is
how things *should* work. I'm going to do a number
of tests to make sure they do work this way to sort
out some of the more specific questions.

Dictionaries can be used in two different ways with
the LingPipe commands and API.

1. Dictionaries can be specified as part of training an estimator.
In this case, they increment training counts as Sun Liu
hypothesized. Documentation can be found with the
ne.NEDictionaryTrain class. I'll have to add documentation
to indicate exactly which estimates get incremented using
this scheme eg. "John Smith" as a person increments
counts for P(John|ST_NAME) and P(NAME|ST_NAME,John)
and P(Smith|NAME,ST_NAME,John). It's pretty clear in
the code what's going on (I hope).

2. Dictionaries can also be specified at run time. In this
case, user dictionary entries will be tagged (and will
remove any conflicting entities found by the statistical
model). This is the usage Breck Baldwin was referring to.
The run time method is also used for pronoun dictionaries.

In both cases, tags may be used that are not seen elsewhere
in the training data. If that isn't working for some reason,
please let us know.

I'll excerpt below sections 9 and 10 of the getting started
documentation, which explains usages for these two
cases.

- Bob Carpenter
carp@...


Excerprted From: http://www.alias-i.com/lingpipe/getting_started.html

9. Run-time Dictionaries
---------------------------------------------------------------------
-

You can define your own dictionaries to be applied at run time.

Like most of the NE command-line args, the user-dictionary is
specified in com.aliasi.ne.command.AbstractNECommand

You can define your own class for the user dictionary, extending
com.aliasi.ne.NEDictionary

Let's say you call it foo.bar.BikingNEDictionary. Dictionary entries
are added with tokenization information. You can use our tokenizer
for English to create the tokens. There's a static convenience
method:

String phrase = ...;
String[] tokens = new IndoEuropeanTokenizer(phrase).tokenize();

You have to provide a no-argument constructor
that gets called reflectively when you use
the dictionary in a command, as in:

java NEAnnotateCommand .... -
userDictionary=foo.bar.BikingNEDictionary

It then applies the user dictionary after the speculative
(statistical) NE extraction, and takes the userDictionary entries to
override speculative ones. It works in a left-to-right fashion, at
each token, taking the longest matching user dictionary entry and
removing any overlapping speculative mentions.



10. Dictionary-based training data
---------------------------------------------------------------------
-

If you're doing your own training, you can provide a file
to be added to the text training files that contains
dictionary entries. As noted in AbstractNECommand (see linke above),
this is specified to the command as:

java NETrainCommand ... -dictionary=DictionaryFilePath

The format of the dictionary is as specified in:

http://www.aliasi.com/lingpipe/1_0_6/javadoc/com/aliasi/ne/NEDictiona
ry
Train.html

This increments the training data for each phrase as if it had
been seen the given count number of times.






Thu Aug 5, 2004 8:59 pm

colloquialdo...
Offline Offline
Send Email Send Email

Forward
Message #56 of 797 |
Expand Messages Author Sort by Date

I believe this pertains to a number of messages that arrived when I was on an extended vacation. This is how things *should* work. I'm going to do a number ...
Bob Carpenter
colloquialdo...
Offline Send Email
Aug 5, 2004
8:59 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help