... That's about what our demo does. But we don't have dates turned on in the demo models, and if we get this case right, "McDonald's" will be of type...
I want to train the Named Entity Detector to extract entities that I specify. In the command demo it says this can be done using WordFreak. If I did the...
Hi. Yes, I looked through the demo and saw that method, but what I get confused by is that in that method I need an already defined NEChunker. But how do I...
... We ship a couple of them. One's trained on English news and the other on genomics data from MEDLINE. For instance, here's the code to create one in the...
... Do you have WordFreak working from the included command? ... Have the annotated files in a directory and run the training commmand as illustrated in the...
Hi, I have used EN_GENOMICS.model to identify biomedical terms in some medline abstracs. I need the term class information in the tagged abstracts for our...
Hi. In one of the recent updates of LingPipe (starting with 2.2.0, I think), has there been a change in how LingPipe marks sentence boundaries when there is an...
... Yes. But it's easy to recreate the old behavior. ... But it's possible to change this based on the parameters in the models. The long answer is the...
Bob- ... I think that I might be accidentally training with the same data multiple times. Can you explain a little more about how to use the "cross-entropy...
... The number is a log (base 2) probability. Probabilities are numbers between 0 and 1. For instance log2(1) = 0 log2(1/2) = -1 log2(1/4) = -2 log2(1/8) = -3...
I tried to parse medline with lingpipe 2.2.1. When I used MedlineCitation.keywordLists, it always returned a list with 0 length. Anyone can help? Thanks...
... Are you sure you're looking at citations that have keyword lists? Most of them don't seem to have keywords. If you know you have keyword lists that...
I tried using the code below (trained on the EN_NEWS model) to do some named entity annotation, and I got a few errors I was wondering how to fix. Namely, in...
... I wish fixing this was so easy. The models are statistical and reflect the distributional aspects of their training data perhaps a bit too closely. The...
Apologies for the new spate of spam. New memberships now require approval and posts are moderated as well. Once the banned member info gets propogated across...
Hi I'm trying to use the class com.aliasi.dict.ApproxDictionaryChunker to do some fuzzy matching. I created a trie-dictionary with all my dictionary terms and...
... Your mother board is facing south--it will never work that way because of some bad experiences I had as a youth.... But on a more serious note, you just...
I'm still struggling with working with large language models. My DynamicLMClassifier trained with an ngram of 8 over 100,000 documents of about 1K each results...
... Pruning should make a very big difference in size, even pruning to min count of 2. If that isn't making your models much smaller, could you send the code...
hi! i am working in my thesis resolving anaphoras related to geo-references in spanish texts. I would like to know if it's possible to use Lingpipe to find...
Bob- ... My pruning code is pretty simple: private static void prune (DynamicLMClassifier classifier, int threshold, long trainedBytes) { if (threshold < 2) ...
... I assume that you are going to be developing your own anaphora resolution algorithms--the one in LingPipe is based on my thesis, heuristic and going to be...
... The algorithm's javadoc-ed, but it assumes first-best named-entity input and only performs coreference within a single document -- not across documents. ...
... The FixedWeightEditDistance() no-arg constructor sets match weight to zero (0), and all other weights to infinity. I'm guessing you don't have upper-case...
Hi, I'm comparing the performance of various coreference resolvers on the MUC6/7 corpuses. At the moment I'm looking at OpenNLP, JavaRAP, and LingPipe. So far...
... I assume these are recall then precision numbers. Those numbers are pretty bad for all the systems-- <flashback> My first "big" NLP project was assembling...
Hi, I working an application in which I should to train a classifier in different moments, thus the proper class that can help me is DynamicLMClassifier, since...
... To catch people up, David sent me some data and I have had a chance to look at it. LingPipe output is poo-poo-ca-ca (that is a technical term in comp ling)...
Breck, Hopefully I'm not sticking my foot in my mouth here, but could you take another look? I'm aware of the offset errors -- but, as I believe I noted in...