Hi, I am working on a project whose aim is to recognize postal addresses in spanish text documents. Addresses could be in different formats, for example: *...
... The good news is that you can use LingPipe to do this. The bad news is that you'll need training data. The good news is that we just built an interface to ...
LingPipe 3.1.2 is now out. It's a minor revision of 3.1.1. Here's the change list: Medline Citation Parser Properly handle reprinted-in elements. TF/IDF...
This somehow slipped through and I just found it in my unanswered mail bin. Sorry for taking so long to respond. (Please re-mail questions or mail me directly...
Dear Sirs, I have 2 sentences : S1 = W1 W2 W3 W4 ... Wn and S2 = Y1 Y2 Y3 Y4 ... Yn. I want to compare if the sentences S1 and the S2 is equal (with fuzzy-like...
... I've been meaning to write a tutorial on string comparison, as it's now relevant for all of our feature-based classifiers. So expect that at some future...
Hi all. I've been working with Lingpipe classifiers a bit, and have previously sued a NaiveBayesClassifier with an IndoEuropeanTokenizerFactory, created as: ...
Hi, A recent email talked about using Lingpipe to measure sentence simiarity... is there a feature in Lingpipe also to measure word-similarity? Would someone...
... This is the right way to do this. ... My guess is that one of the character language model classifiers would work better. Just use: classifier =...
... This depends on your notion of similarity. If it's textual similarity, then yes. There are a bunch of similarity measures built into LingPipe, including...
Hello, I am getting the following exception when trying to extract information from a text. What does that "boundary char" mean? Is there any way to escape...
Hi I am librarian working in a biomedical research institute in Africa.I am interested in getting MEDLINE data from mySQL database,using the PMID.I have...
... I can't help you getting info from a MySQL DB out of a DB. You'd need JDBC for that, not LingPipe. If you want to get MEDLINE data out of MEDLINE's XML...
Hi all - I am pretty new to the search space, so I apologize if my question is at a very basic level. I have a huge collection of data/articles from many...
... Lucene's a search engine and LingPipe's a suite of natural language processing tools. The functionality they offer is largely complementary, but as both ...
Bob, Thanks for the response. My dilemma is in the order of processing - lingpipe first and then lucene? Or lucene first and then lingpipe? Does it matter? ...
I'm performing multi-category classification by creating a BinaryLMClassifier for each class, which should return true or false for each input string. For a...
Hi Abby, Without going into too much how and why detail, I'd stick with a single Lucene index that includes data obtained via LingPipe. e.g. type=movie...
... The advantage of doing LingPipe classification and then indexing everything in Lucene is that you can use the fielded Lucene index to search combinations...
... That's exactly right. ... You can also use a two-class classifier with an explicit positive and negative model. The negative models are sometimes called...
... True...I'll probably stick with just the accept model. ... Ya. I meant to say it made cross-entropy scores "worse" rather than "dropping" them. ... I'll...
Quick question on the usage of KnnClassifier. My implementation ... public static int K = 5; public static boolean WEIGHT_BY_PROXIMITY = true; private...
... In general, you can find use cases in the corresponding unit tests. Unfortunately, these don't always anticipate user's needs well enough, so you wind up...
... TokenFeatureExtractor(tokenizer); ... KnnClassifier(extractor,K,proximity,WEIGHT_BY_PROXIMITY); ... I see. The reason I had gravitated toward Jaccard...
Thanks Otis and Bob. So, as per your suggestion, the flow of data would be: the crawled data must first be run against the classifiers, and then be indexed...
... Right. ... You should only use separate indexes if you are not going to need queries that cross databases. You could also just put the domain info in its...
Hi, I am trying to use lingpipe to implement sentimental analysis over reviews of electronic items such as processors.I need to get the training data for this...
... What exactly do you want to do? Do you want to find positive and negative reviews? Analyze overall sentiment? Or just tag each document as to positive or...