Hello all, I'm currently using ne-en-news-muc6.AbstractCharLmRescoringChunker on some test corpi. I've noticed that when I'm passing various news articles,...
... Actually, it doesn't respect them at all. Named entity recognition is solely done on a token-by-token level. ... The right way to do what you want to do...
Over the weekends, we began having DNS problems with alias-i.com. We haven't gotten any alias-i.com email, and the LingPipe site is temporarily down. Our...
hi! i am trying to do coreference resolution in spanish texts with lingpipe. I have a small training corpus with PERSON/LOCATION/PRONOUN annotated data. I...
... Here is a snippet that chunks text based on paragraph features which then has to be inverted to find valid text for sentence detection. This is fairly...
Hi, I've been searching through Google, the LingPipe API, and the LingPipe Yahoo group archives to no avail. I'm trying to create a TokenizedLM trained not on...
... I'm afraid you're right -- it requires some delving. I've just made the changes for 2.3.2, which we're releasing later next week. I have the Google disks...
I just put LingPipe 2.4.0 up on our web servers. It's a fairly minor change in terms of code, but the upgrades are not 100% backward compatible. I ran tests...
Hi all, this is my first post, I'm a new LingPipe user, but very impressed so far. Kudos on an excellent piece of software! As an early exercise, I'm trying to...
... May I ask what the basis for the classification is? ... OK -- you've got the right intuition here. At a high level, our text classifiers (and everyone ...
Hi Bob, ... Yup: legit blog urls vs. spam-blog urls. I ran across a paper describing classification by training exclusively on tokenized versions of the URLs. ...
David, Do you have a link to the paper you are referring to? I could use this to enhance spammer detection in Simpy (see sig). Thanks, Otis -- Simpy --...
... Exactly right. My bad for not having the reference available. I wasn't trying to be mysterious, I didn't understand it would be of general interest here....
Bob, as a followup to your comments below, I've been looking at the javadoc for ScoredPrecisionRecallEvaluation. I see that it provides "an evaluation of...
... Nope -- that's the right definition. The "operating characteristic" is implicit -- it's just a ranked evaluation. It basically tells you what the...
LingPipe 2.4.1 Released ======================= This is a patch release replacing LingPipe 2.4.0. It patches all bugs that have been reported to us; thanks to...
Hi! I'm working on a system to try to automate the analysis of customer satisfaction based on a database of their e-mail correspondences. So far we've had good...
... We've had requests to do other scalar classifiers, like reading level classification (on a grade-in-school scale). This is a general problem in statistical...
Hi All, I am new in on Dynamic Model Language Classification. In fact, my knowledge are very limited on how to user LingPipe for classification, so I am trying...
... Let me reformulate your question to see if I understand it. You have a set S1 of docs in language L1 and a set S2 of docs in language L2. Now you want to...
... It sounds like you're using the chunker: $LINGPIPE/demos/models/ne-en-bio-genia.TokenShapeChunker That is, indeed, a TokenShapeChunker. I'd suggest not...
Hi bob, Thank you for the big tip earlier. I have some more questions about training Chunkers (LmRescoringChunker, TokenShapeChunker). I made a set of tagged ...