Search the web
Sign In
New User? Sign Up
LingPipe
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Re: Training LingPipe [w. new corpus format]   Message List  
Reply | Forward Message #69 of 796 |
Re: Training LingPipe [w. new corpus format]


> 1. Convert your data into a format
> LingPipe understands, such as the CoNLL
> line-based text format or the MUC XML format.

All of our documents are in XML format, but we are using TEI notation.
Using NoteTabPro I wrote a simple clip that converts MUC to TEI
notation. I believe I can do the reverse just as easily. So the
conversion shouldn't be a problem.

Once the xml file is in the MUC format, do I need to modify it any
further? And what about headers? We have a lot of archival
information in the xml header. Should I delete the header for
training? Also, can I use multiple xml files..or should I combine all
the documents together for training purposes. And finally what is the
command I need to use to train? Is it this?

java NETrainCommand -MUC myTrainingFile.xml


As for the second option...it was all gibberish to me, so I think the
first way is better. Thanks a lot for your help

Jason Goltermann








Thu Mar 10, 2005 2:19 pm

jkriil
Offline Offline
Send Email Send Email

Forward
Message #69 of 796 |
Expand Messages Author Sort by Date

... Yes, that should be more than enough. Are you tagging different kinds of entities? The more entity types there are, the more training data you need to ...
Bob Carpenter
colloquialdo...
Offline Send Email
Mar 9, 2005
10:28 pm

... All of our documents are in XML format, but we are using TEI notation. Using NoteTabPro I wrote a simple clip that converts MUC to TEI notation. I believe...
jkriil
Offline Send Email
Mar 10, 2005
2:20 pm

... Great. That's the hardest part of the whole process. I have to do it all the time for evaluations. I looked up TEI and found more than I bargained for. I...
Bob Carpenter
colloquialdo...
Offline Send Email
Mar 10, 2005
5:58 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help