I am trying to create an application that builds a word-based language model incrementally as it is exposed to word sequences. The model needs to be saved in...
... There is a compileTo(ObjectOutput objOut) method for TokenizedLM that allows for serialization. Deserializing however does yield a trainable TokenizedLM,...
As I added more documentation for the next release, I realized that what I said last time was wrong. The compiled form of a NaiveBayesClassifier is an...
How Techniques Behind NLP (Natural Language Processing) Driven Observations Influence the Quality of Observations for Use in Financial Models Breck Baldwin,...
All, We have 5 MBA students focused on improving LingPipe products and services. They will need about 15 minutes of your time to fill out an anonymous online...
Could this be consulting on mlp/ machine learning implementations in general or does it have to be in lingpipe details? From following group I know you guys...
... I am a better computational linguist than machine learning guru but I would not restrict solutions/feedback to just LingPipe tools. We point customers to...
I assume by "DLM" you mean "dynamic language model" (that's our terminology, not an industry standard). In which case, many of our tutorials can help you out: ...
... Hi Yes, I meant "Language models". I have used it in one of my projects, now it giving accuracy better than all other classifiers. I want to understand how...
The javadoc for the various systems has full definitions. The short answer is that you build a language model p(text|cat) for each category c, then a...
I apologize in advance if this sounds a little basic, I just started using lingpipe to add ner to a tool I'm building and I'm not really an nlp person. I'm...
The source of the problem is a mismatch between the MUC training data and the data you have. For instance, MUC's all in ASCII and I doubt the ampersand (&)...
Hi In following example space tokenizer is implemented with "\92;S+", what does it mean?? Dont you think it should be like "\92;s+"? Or this S(capital S) is used...
The tokenizer just matches sequences of non-white space characters of length at least 1 to constitute a token. Characters not matching will be token...
... Thanks, I got confused because I thought it is working like "split" function in java. Thank you -- Regards... Sangeeta [Non-text portions of this message...
It's about normalization of probability estimates. For the process model, SUM_{strings s of length N} prob(s) = 1. For the boundary model, SUM_{strings s}...
While browsing around the Lingpipe site a couple weeks ago, I recall seeing a reference to a contrib module or something that was an interface for training...
Hi, As a part of my academic research project, I am trying to build an application wherein I will have a set of urls retrieved from the web. The task is...
Yes. You can use LingPipe's com.aliasi.util.Compilable interface to compile the model to a file or Java's java.io.Serializable interface to serialize the...
I would love to train a language model in Lingpipe to use in speech recognition engine like CMU's Sphinx4. Any thoughts on exporting a Language Model from...