We just rolled out LingPipe 3.6.0. As usual, find it on our homepage: http://alias-i.com/lingpipe Here are the details: Intermediate Release The latest...
Dear All, my name is Marco, and I'm new in this group and as lingpipe user, so do not hate me for some silly questions :-) My research project requires to...
... We don't distribute any data, but our named entity tutorial points to some sources of data. ELRA and LDC also distribute data, but it's expensive. Most...
Dear Bob, my problem is multilanguage in the sense that I handle documents that can be written in German, French, Spanish and so on, each document is written...
... Nothing free and fast, I'm afraid. We don't have corpora in French or German. Spanish is easy -- you can get it from the CoNLL data. You can get Spanish...
Hello- I'm having a blank. In the context of spell checking, is the edit distance used between the user-entered term and the suggested term, or the reverse? I...
... I should make this clearer in the doc for each operation. ... Right. It's the noisy-channel setup, so it's edits going from the suggested term to the...
As of 3.0, the chunking interface completely changed so it's no longer backward compatible with 2.x code. The last version of LingPipe to support the...
I recently downloaded LingPipe 3.6.0 and tried the ChineseToken tutorial sample. I always got NullPointerException. My environment is: - Window XP - Java...
... It sure does. Thanks for the detailed bug report. The culprit is the following file: $LINGPIPE/src/com/aliasi/spell/CompiledSpellChecker.java The method...
Hi Bob, I have a question on Model Quality. I used the ChineseToken sample to generated a words-zh-as.CompiledSpellChecker model, which has size 78,303KB. I...
... The other way to control model size is take longer n-grams and prune out low-count sequences. If you follow the tutorial, you'll see where we run standard...
Hi Bob, Thanks for replying. Does longer n-grams model mean more accuracy? How do I prune out low-count sequences from model using LingPipe? I have some...
... Usually longer n-grams means more accuracy up to a point at which accuracy plateaus. Longer n-grams can overfit in some situations compared to shorter...
Thanks, Bob. The goal of making English Chinese word alignment is to create some TMX files for "translation memory" tools used by translators. We have some MT...
LingPipe 3.7.0 is now available from: http://alias-i.com/lingpipe The only significant change is an update to the MEDLINE DTDs used by the MEDLINE parser....
LingPipe 3.7.0 will generate a warning when compiling all jars because of an issue with non-ASCII chars in one of our java files. It works with Windows...
Hello, I was looking at NGramBoundaryLM and wondering why exactly those begin and end characters are needed and when you'd use NGramBoundaryLM instead of...
... The spelling corrector and HMMs wrap this all up the right way by default. Classifiers require you to specify which to use, and the decision is based on...
A user reported a problem with HMM decoding on a specific hardware/software configuration: Athlon64/Ubuntu 8.1 64-bit/JDK 1.6 An exception is thrown in the...
Bob - I recently got burned by http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6196102 going from Java 1.5 to Java 1.6 (1.6.0_11-b03, 32-bit, Linux on Intel...
... Thanks for the tip. I should've tried forcing interpreted mode to help debug. My 64-bit java doesn't have a client JVM, only a server one. I ran into some...
Out of curiosity, what happens when you hit this bug? Does the JVM just die (HotSpot error), or? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr -...
Yea, I guess they consider {@code <= Integer.MAX_VALUE} a rare case or something and its been fixed and regressed according to our cross JVM build experience....
... The problem was that a variable that's mathematically bounded to be between 0 and 1 with infinite precision wanders outside of that range with (some)...
Hi: I am a newbie of lingpipe. I read some tutorial on the langpipe website but many NLP terms frightened me. The classification samples included in the...
... There are also classifiers used in the language ID, word sense disambiguation, sentiment, and logistic regression tutorials. The word sense disambiguation...
... I am planning to crawl universities, research labs and job websites to get many researchers' profile information and then provide a search interface for...