Sue Chen wrote:
> ...
> I see LingPipe has functionality to extract English sentence from Text.
> But it doesn't have for Chinese. Do you know any tools that does this
> for Chinese?
We did this for Chinese in the past by extending
sentences.HeuristicSentenceModel with the appropriate end tokens
for Chinese and using the tokenizer.CharacterTokenizerFactory.
I can't remember the circle that's used
for end-of-sentence's unicode right now, and we didn't try to
get fancy with sequences of punctuation, but it worked well
for the corpus we had.
> Does LinkPipe do chucking for English Sentence?
> For example,
> [The bald man] [was sitting] [on his suitcase].
It can find the VPs and NPs, but you have to define
them in terms of parts of speech (see the part-of-speech
tutorial), or you have to train a named-entity chunker
with the chunks ahead of time.
This isn't language dependent, but you need a part-of-speech
tagger for the language in question, which requires
training data.
- Bob Carpenter
Alias-i