Hi, I recently started evaluating the LingPipe SDK and I have a few problems and questions I build a big corpus, with around 50 text categories that represent ...
... What exactly do you mean by text styles? As in writing styles like letter, e-mail, technical article, etc.? Are those styles mutually exclusive? If not,...
i will try to clarify my issue in regards to your questions. i have around 60 text categories, for each category i have between 200 - 1000 text files that...
... It's the amount of text, not number of files that determines sizing. ... OK -- that's fairly small. ... You probably want an NGramProcess classifier unless...
Hi bob, I used your advise and pruned the small ngrams, I use 3 as a counter and it solved the java heap problem. You wrote: You need to train the individual...
Overall, you might want to rethink your classifier. Are the categories mutually exclusive and do they collectively cover all input types? Is there any ...
Hi Bob First I would like to put in a good word. I have worked with open source project in the past but never did I get such response to technical questions, ...
... I get error "The method compileTo(Compilable, File) in the type AbstractExternalizable is not applicable for the arguments (NGramProcessLM, String)" Nadav....
... Sorry for the half-baked code sketch. You need to send the compilation methods files, not strings. So that's: AbstractExternalizable.compileTo(lm,new...
After a built 60 lm's and combined them to one classifier , its size on the disk is around 900 mb, is this normal? When I will use it will load itself to the...
Hi Bob another small question, after i combine all the lm to one classifier i want to save it ti the disk, i cant use AbstractExternalizable.compileTo because...
Hi When a try to classiy text using my classifer i get the following error "Ratios must sum to number greater than zero. Found sum=0.0" any idea what it...
Nadav: Let's take this discussion off line after this reply. If you have more questions, just send them to lingpipe@... and I'll answer them directly....
Hi, I'm working on an IE project for my final year project at the University. I'm using LingPipe to annotate sentences for name entity recongonition. A GUI...
... Thanks. Let us know if you have suggestions. We're probably going to drop it into the main LingPipe distro after I unplug all the 1.6-dependent Swing ...
... I think the majority of the MUC6 systems are designed for specifc usage. A user without programmer language cannot design what information to extract...
... I'm not sure what you mean. Most of the MUC6 systems were designed for MUC6! DARPA was having all of its contractees compete. ... I'm still not sure what...
I have attached a minimal input file which demonstrates the problem. The first entry with PMID=12264172 gets it's abstract registered while the second with...
I have attached a minimal input file which demonstrates the problem. The first entry with PMID=12264172 gets it's abstract registered while the second with...
... Aha. The OtherAbstracts are available, just in a different place in both the XML and in our domain object model representation (which follows the DTD...
So it is actually not required for an article to have an abstract according to the requirements in MedlineCitationSet. That is good to know. It is surprising...
Dear all, with the help of knowledge posted previously on this mailing list I compiled the program to extract PMIDs and MesH descriptors from MEDLINE citation:...
I found the solution myself: modify line ... to System.out.println(citation.pmid() + "|" + lists[i].descriptor().topic()); And the result looks like requested...
Hi, The testing I did on spellcheck part gives around 110 ms for each spellcheck. The training set is pretty small, and contains around 200 characters. Is it...
... We've trained spell checkers on dozens of GBs of data with a large lexicon and gotten run times of 100 queries/sec (10ms/query) for queries with averages...
Thank you very much for your reply. The data I used for the testing is as following: " for (int i = 0; i < 100; ++i) { // trainer.train("abracadabra "); ...
... I'm not sure how eclipse is going to run the JUnit code. If it forks a new JVM for each test, I'd think it'd be even slower than you report. The test does...