Hello, I am using the LogisticRegressionClassifier for multi-class classification. There is a lot of bias in my train data; few categories have way more train ...
1212
Bob Carpenter
colloquialdo...
Sep 12, 2011 8:08 pm
Are you sure it's bias, or does the balance of categories reflect the data on which the classifier will be evaluated? In which case it's what we'd call an...
1213
menabad
Sep 15, 2011 3:53 pm
Dear All, I have a question regarding the First-Best and N-Best best chunks. What I understood is that First-Best is the selected set of chunks with the...
1214
Bob Carpenter
colloquialdo...
Sep 15, 2011 4:02 pm
That's not quite how it works. I'd suggest reading the tutorial closely for examples of how it's supposed to work and more details on what it's doing. The...
1215
menabad
Sep 15, 2011 4:47 pm
Thanks for the reply ... I got the difference now .. but is it surprising to get some chunks by First-Best and not getting it at all by N-Best?? or this case...
1216
colloquialdotcom
colloquialdo...
Sep 15, 2011 5:15 pm
The first-best result should be the first result returned by n-best. - Bob...
1217
menabad
Sep 15, 2011 7:19 pm
Sorry but I meant n-best of confidence chunker should it contain all the chunkers returned by the First Best ?? Thanks Mena...
1218
Yogesh
yogesh.pandi...
Oct 1, 2011 7:18 pm
Hello, I am using the LogisticRegressionClassifier. I have two queries. 1. My corpus size is 3000 XValidatingObjectCorpus<Classified<CharSequence>> corpus =...
1219
Bob Carpenter
colloquialdo...
Oct 2, 2011 8:35 pm
Speed Up ... It shouldn't be taking a day to train over 3000 instances using logistic regression. How many categories are there? How may features are non-zero...
1220
Yogesh
yogesh.pandi...
Oct 3, 2011 4:25 pm
I have 110 categories. I am not explicitly specifying the number of features. I let the ChunkerFeatureExtractor do the work. Because I have some custom NEs as...
1221
Bob Carpenter
colloquialdo...
Oct 3, 2011 6:28 pm
110 categories is a large number for logistic regression. The number of feature vectors trained and evalued is the number of categories minus 1 (e.g., you only...
1222
Yogesh
yogesh.pandi...
Oct 3, 2011 6:30 pm
For 30 documents in 3 categories (10 each), ChunkerFeatureExtractor takes 9 min 6 seconds to extract features. - Yogesh ... [Non-text portions of this message...
1223
Bob Carpenter
colloquialdo...
Oct 3, 2011 7:31 pm
The feature extraction is independent of the number of categories, but directly related to the number of instances (here documents). - Bob...
1224
Yogesh
yogesh.pandi...
Oct 5, 2011 2:57 pm
I changed to JVM 7 - 64 bit and running it in the -server mode. It is faster, but not enough. Can I do something like, do ChunkerFeatureExtraction separately...
1225
john_irving_tait
john_irving_...
Oct 5, 2011 3:44 pm
Forgive me if I'm violating group protocol's - I'm an newby ... I have a possible opportunity for a lingpipe expert to support a project in the London, England...
1226
Mike Ross
squawktopus
Oct 5, 2011 3:56 pm
Yogesh, Your original post mentioned that training was taking more than a day, and a more recent post says that the feature extraction takes 9 minutes. Am I ...
1227
Yogesh
yogesh.pandi...
Oct 5, 2011 4:20 pm
Hi Mike, The 9 minutes feature extraction was on a sample set of just 30 documents in 3 categories. My full train set is about 2500 documents in 110...
1228
Mike Ross
squawktopus
Oct 5, 2011 6:08 pm
Hi Yogesh: The reason I was interested in the text output is that it will tell us the number of dimensions, which cannot be determined from your description so...
1229
Yogesh
yogesh.pandi...
Oct 5, 2011 8:54 pm
Hi Mike, Here is the text; 30 documents, 3 categories. I tried with the larger set, but it is taking time. ... com.aliasi.features.AddFeatureExtractor ... 4:17...
1230
Bob Carpenter
colloquialdo...
Oct 5, 2011 9:18 pm
OK -- please send your data if you don't mind. No way this should be taking this long. I'll have time to look at it Thursday (tomorrow), but otherwise won't be...
1231
russell.reeves62
russell.reev...
Oct 17, 2011 4:30 pm
I tried the ftp link in the Named Entity Tutorial but the link seems to be broken. Google search was fruitless. Any idea where I can get a copy of genetag.tag?...
1232
reckb
Oct 17, 2011 4:53 pm
What is the failed link? The only ftp link on the page is ftp://ftp.ncbi.nlm.nih.gov/pub/lsmith/MedTag/medtag.tar.gz and it works fine for me. Breck...
1233
Bob Carpenter
colloquialdo...
Oct 17, 2011 7:07 pm
I just wanted to report back to the group that as I suspected, the speed issue was not with logistic regression training per se -- the time was all taken in...
1234
Yogesh
yogesh.pandi...
Oct 17, 2011 7:36 pm
FeatureExtraction is a one time thing (for me). I was wondering if I can write the features Map<String, Number> to files for each document in train and test...
1235
Bob Carpenter
colloquialdo...
Oct 17, 2011 7:47 pm
Yes, you can do that. I'll try to do something like this myself, but am on the road and won't be back in the office for two weeks. You need to do two things: ...
1236
Russell Reeves
russell.reev...
Oct 18, 2011 1:39 pm
Sorry. Chrome was giving me : This webpage is not available The webpage at ftp://ftp.ncbi.nlm.nih.gov/pub/lsmith/MedTag/medtag.tar.gz might be temporarily...
1237
yalanciborsaci
Oct 18, 2011 10:32 pm
Hi, In case of binary classification (a NaiveBayesClassifier with two categories), I'm using ConditionalClassifierEvaluator (i.e.,...
1238
Bob Carpenter
colloquialdo...
Oct 19, 2011 4:35 am
... Good question (both form and content -- I wish everyone was this thorough). It sure seems like for a measure like this it shouldn't matter which ...
1239
Amac Herdagdelen
yalanciborsaci
Oct 19, 2011 11:19 am
Thanks for the answer! However, there is a problem. The area under the curves in your example are really the same in both one-versus-all directions, 0.83333 --...
1240
Bob Carpenter
colloquialdo...
Oct 19, 2011 2:40 pm
Sorry -- should've calculated the AUC, not just looked at the results. Thanks for the detailed analysis. I didn't get your last e-mail, but yes, order of ties...