Search the web
Sign In
New User? Sign Up
tagsoup-friends · Friends of TagSoup
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 483 - 512 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
483
XMLReader tagsoup = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser"); tagsoup.setFeature( ...
Torsten Curdt
tcurdt@...
Send Email
May 8, 2006
5:21 pm
484
... Ooops. Ouch. /me puts a paper bag over his head. Fortunately, help is on the way! See the next posting. -- John Cowan http://ccil.org/~cowan...
John Cowan
johnwcowan
Online Now Send Email
May 8, 2006
7:22 pm
485
This release, available as usual at http://www.tagsoup.info , fixes a couple of nasty paper-bag bugs in 1.0rc4, and adds a new feature, the --nocolons switch,...
John Cowan
johnwcowan
Online Now Send Email
May 8, 2006
7:24 pm
486
... That was quick! :-) Thanks! -- Torsten...
Torsten Curdt
tcurdt@...
Send Email
May 9, 2006
1:40 am
487
... I was just about to release it, so adding a fix for your problem (which was obvious once I thought about it, and the result of inadequate testing) was...
John Cowan
johnwcowan
Online Now Send Email
May 9, 2006
2:08 am
488
Hi, I am trying to use Tagsoup with StAX but cannot seem to get all the peices to fit together. I prefer to use the StAX API to parse my document than SAX....
zealandes
Offline Send Email
May 12, 2006
8:59 pm
489
... I found that when I changed implementations to the Codehaus version it just worked! Looking at their code I can see they treat SAXSource as a special case...
zealandes
Offline Send Email
May 12, 2006
10:23 pm
490
Actually, this is completely bypassing tagsoup isn't it?...
zealandes
Offline Send Email
May 12, 2006
11:14 pm
491
... With just these tiny fragments I can't tell. TagSoup is a SAX parser, so it should be possible to pass an org.ccil.cowan.tagsoup.Parser object to anything...
John Cowan
johnwcowan
Online Now Send Email
May 13, 2006
12:32 am
492
I think that what I was doing was fundamentally wrong. To bridge between SAX and StAX in a streaming way would require one thread to parse the document and...
John Patterson
zealandes
Offline Send Email
May 13, 2006
2:05 am
493
... Yes. ... If you use XOM (http://www.xom.nu), you can get either a complete tree or (if you subclass XOM's Builder class) you can control exactly which...
John Cowan
johnwcowan
Online Now Send Email
May 13, 2006
6:46 am
494
TagSoup trekking....across the universe... Am I reading correctly that the TagSoup cognoscenti call bogus HTML tags like <o:blah> "bogons", or is there more to...
Rob Staveley
tom_staveley
Offline Send Email
May 13, 2006
7:16 am
495
... A bogon is any element that's not in the schema (src/definitions/html.tssl). It may or may not have a colon in its name. By default bogons are assumed to...
John Cowan
johnwcowan
Online Now Send Email
May 13, 2006
10:03 am
496
Hello! I'm using tagsoup for parsing html and web-crawling. After parsing about 9000 urls successfully tagsoup falls with NullPointerException on 38th line of...
izinkovsky
Offline Send Email
May 18, 2006
12:29 pm
497
... Ouch. Please send me by email (not just an URL, the content is unstable) the document that fails. -- Dream projects long deferred John Cowan...
John Cowan
johnwcowan
Online Now Send Email
May 18, 2006
12:42 pm
498
... about ... unstable) ... I can't reproduce this effect, when I try to reparse failed pages it's ok. Class begins throwing exceptions at differens pages...
izinkovsky
Offline Send Email
May 18, 2006
1:01 pm
499
... Double ouch. The failure is not data-dependent; it happens when the parser tries to initialize its stack with the dummy element named "<<root>>". I guess...
John Cowan
johnwcowan
Online Now Send Email
May 18, 2006
1:57 pm
500
Hello! I'm creating new instance of Parser object for every page. I have 91Mb links to parse in my list, how can I send it to you? BTW, I use tagsoup in...
Igor Zinkovsky
izinkovsky
Offline Send Email
May 18, 2006
3:19 pm
501
This release fixed a bunch of bugs around namespaces. The SAX spec was a little hard to follow, so I am now doing a subset of what Xerces does, in hopes that...
John Cowan
johnwcowan
Online Now Send Email
May 20, 2006
6:22 pm
502
I've found several ways of how to use tagsoup in code, which one is (more) correct according to memory usage and performance? 1. New parser for each page with...
izinkovsky
Offline Send Email
May 25, 2006
12:22 pm
503
... Obviously safe, costs some memory, shouldn't affect performance except for the cost of creating the schema object. ... Safe provided the schema is not...
John Cowan
johnwcowan
Online Now Send Email
May 25, 2006
12:54 pm
504
Hello! In third way I mean that _each_ thread will have it's own Parser and Schema object created once in thread's constructor. In the run method only parse() ...
Igor Zinkovsky
izinkovsky
Offline Send Email
May 25, 2006
1:49 pm
505
... Yes, that is perfectly safe. As usual there are many wrong ways, but several right ways with different trade-offs. ... Huh. I wonder if you are holding...
John Cowan
johnwcowan
Online Now Send Email
May 25, 2006
2:08 pm
506
I'm using TagSoup to read HTML e-mails into a Lucene application, which indexes the text content from them. In essence, I have org.ccil.cowan.tagsoup.Parser...
Rob Staveley
tom_staveley
Offline Send Email
May 25, 2006
2:45 pm
507
... TagSoup already recognizes comments as such; that is, it passes them back only to an optional LexicalHandler, not as character data. The exception is...
John Cowan
johnwcowan
Online Now Send Email
May 25, 2006
3:42 pm
508
... I am implementing a LexicalHandler and ScanHandler in PlainTextWriter, but comment is handled by ScanHandler cmnt, which is a do nothing. class...
Rob Staveley
tom_staveley
Offline Send Email
May 25, 2006
6:46 pm
509
... apparent ... Hmmm. OK, I'm in the source tree for 1.0rc6 for the first time, and I'm looking at src/definitions/html.stml, but there's nothing very ...
Rob Staveley
tom_staveley
Offline Send Email
May 25, 2006
7:01 pm
510
I see you setFlags(0) the ElementTypes for "script" and "style" in the static HTMLSchema instance in CommandLine when you set --nocdata, but I can't see where...
Rob Staveley
tom_staveley
Offline Send Email
May 25, 2006
7:24 pm
511
... HTMLSchema is generated from a Java template in src/templates/org/ccil/cowan/tagsoup/HTMLSchema.java and the XML file src/definitions/html.stml. There is...
John Cowan
johnwcowan
Online Now Send Email
May 25, 2006
8:04 pm
512
When you talk about removing "type='cdata' attributes" from src/definitions/html.stml, do you mean commenting out <action id='A_CDATA'/> ? Sorry to be such a...
Rob Staveley
tom_staveley
Offline Send Email
May 25, 2006
10:56 pm
Messages 483 - 512 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help