Search the web
Sign In
New User? Sign Up
LingPipe
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want to share photos of your group with the world? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Re: Chinese Token Demo Bug patch (CompiledSpellChecker.setTokenSet)   Message List  
Reply | Forward Message #642 of 777 |



On November 12, 2008, suechen275 wrote:

> Hi, I am first time to use LingPipe. I tried ChineseToken sample
> that comes with LingPipe 3.6.0 with icwb2-data.zip. It doesn't work.
> It throws exception: ...

It sure does. Thanks for the detailed bug report.

The culprit is the following file:

$LINGPIPE/src/com/aliasi/spell/CompiledSpellChecker.java

The method setTokenSet() should be:

public final void setTokenSet(Set<String> tokenSet) {
if (tokenSet == null) return;
int maxLen = 0;
for (String token : tokenSet)
maxLen = java.lang.Math.max(maxLen,token.length());
mTokenSet = tokenSet;
mTokenPrefixTrie = tokenSet == null ? null : prefixTrie(tokenSet);
}

You can patch and recompile. Our next release will
include the patch.

Here's the output of the revised program in the
Chinese Tokens demo:

\lingpipe\demos\tutorial\chineseTokens>ant
-Ddata.sighan05=\lingpipe-3.6.0\demos\data\sighan2005\dist run-cityu05

Buildfile: build.xml

compile:
[mkdir] Created dir:
c:\carp\mycvs\lingpipe\demos\tutorial\chineseTokens\build\classes
[javac] Compiling 4 source files to
c:\carp\mycvs\lingpipe\demos\tutorial\chineseTokens\build\classes
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

run-cityu05:
[java] CHINESE TOKENS 2005
[java] Data Zip
File=c:\carp\test\lingpipe-3.6.0\demos\data\sighan2005\dist\icwb2-data.zip
[java] Corpus Name=cityu
[java] Output File Name=cityu.segments
[java] Known Tokens File Name=cityu.knownWords
[java] Max N-gram=5
[java] Lambda factor=5.0
[java] Num chars=5000
[java] Max n-best=1024
[java] Reading Data from entry=icwb2-data/training/cityu_training.utf8
[java] Found 53019 sentences.
[java] Found 4922 distinct characters.
[java] Found 69086 distinct tokens.
[java] Testing Results. Zip Entry=icwb2-data/gold/cityu_test_gold.utf8
[java] Found 1493 test sentences.
[java] Found 9001 test tokens.
[java] Found 1670 unknown test tokens.
[java] Found 2702 test characterss. Found 60 unknown test characters.
[java]
[java] Reference/Response Token Length Histogram
[java] Length, #REF, #RESP, Diff
[java] 1, 19115, 19541, 426
[java] 2, 18187, 17980, -207
[java] 3, 2682, 2459, -223
[java] 4, 759, 800, 41
[java] 5, 116, 158, 42
[java] 6, 36, 63, 27
[java] 7, 22, 24, 2
[java] 8, 9, 11, 2
[java] 9, 5, 4, -1
[java] Scores
[java] EndPoint: P=0.9686796936221845 R=0.9715039042693439
F=0.9700897434274647
[java] Chunk: P=0.9265060534457139 R=0.929108852843463
F=0.927805627721468

BUILD SUCCESSFUL
Total time: 59 seconds

(The other data sets had better performance, as
indicated in the tutorial).

- Bob Carpenter
Alias-i



Thu Nov 13, 2008 9:12 pm

colloquialdo...
Offline Offline
Send Email Send Email

Forward
Message #642 of 777 |
Expand Messages Author Sort by Date

... It sure does. Thanks for the detailed bug report. The culprit is the following file: $LINGPIPE/src/com/aliasi/spell/CompiledSpellChecker.java The method...
Bob Carpenter
colloquialdo...
Offline Send Email
Nov 13, 2008
9:12 pm

Thanks much, Bob.  The patch fixed my problem. ________________________________ From: Bob Carpenter <carp@...> To: LingPipe@yahoogroups.com Sent:...
Sue Chen
suelingpipe
Offline Send Email
Nov 14, 2008
4:38 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help