Search the web
Sign In
New User? Sign Up
LingPipe
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
NER for multi language documents   Message List  
Reply | Forward Message #635 of 777 |
Re: [LingPipe] NER for multi language documents

marco turchi wrote:
> Dear All,
> my name is Marco, and I'm new in this group and as lingpipe user, so do not
> hate me for some silly questions :-)
>
> My research project requires to extract Named Entity from documents written
> in different languages. I have read on the lingpipe web site that it allows
> to do it, but I do not understand if training data for the NE extractor are
> available for research purposes or they are present only in the Developer
> version of the product.
>
> Please can u help me?

We don't distribute any data, but our named entity tutorial
points to some sources of data. ELRA and LDC also distribute
data, but it's expensive. Most of the free data sets have
restrictive licenses.

You can also create your own training data using our
citationEntities sandbox project, but it's a lot of work.

Does the recognizer need to be multilingual in the
sense of handling documents in multiple languages? Or
do you just need NE for multiple languages? You can
often do language identification first, if documents
only contain a single language.

For real multilingual apps, you'll need to train
a single model with data from different langauges.
This has worked pretty well in our experience, at least
for English and Hindi.

- Bob Carpenter
Alias-i



Tue Oct 14, 2008 5:17 pm

colloquialdo...
Offline Offline
Send Email Send Email

Forward
Message #635 of 777 |
Expand Messages Author Sort by Date

Dear All, my name is Marco, and I'm new in this group and as lingpipe user, so do not hate me for some silly questions :-) My research project requires to...
marco turchi
marco.turchi2
Offline Send Email
Oct 14, 2008
5:12 pm

... We don't distribute any data, but our named entity tutorial points to some sources of data. ELRA and LDC also distribute data, but it's expensive. Most...
Bob Carpenter
colloquialdo...
Offline Send Email
Oct 14, 2008
5:17 pm

Dear Bob, my problem is multilanguage in the sense that I handle documents that can be written in German, French, Spanish and so on, each document is written...
marco turchi
marco.turchi2
Offline Send Email
Oct 14, 2008
5:33 pm

... Nothing free and fast, I'm afraid. We don't have corpora in French or German. Spanish is easy -- you can get it from the CoNLL data. You can get Spanish...
Bob Carpenter
colloquialdo...
Offline Send Email
Oct 14, 2008
5:45 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help