The W3C Technical Architecture Group has an open issue (http://www.w3.org/2001/tag/issues.html?type=1#TagSoupIntegration-54) on the relationship of HTML, XHTML...
... John Cowan has such a set in his personal possession. However since it's taken from real world web pages, distributing it would involve massive copyright...
Elliotte Harold
elharo@...
Mar 2, 2007 4:17 pm
702
... AFAIK TagSoup and the HTML5 spec are the only contenders. TagSoup has the constraint "quod scripsit, scripsit": it cannot recall SAX events and issue new...
... The HTML 5 spec. is not what I would call declarative -- discursive, more like it. ... Understood. I guess what I am thinking about is not shipping some ...
... I stated the constraint badly: I can and do postpone SAX events, but not character events, since they are unbounded in size. They at least must be...
It's been a long time for me, but doesn't the main verb need to be pluperfect, and the clause in the subjunctive? Quod scripserit, scripserat. ... From:...
... Well, in the Vulgate Pilate says "Quod scripsi, scripsi" = "What I have written, I have written", when the Jews ask him to take down the sign saying "Jesus...
I guess attempting to correct the bible is pretty much a definition of hubris. Sorry for the distraction. ... From: tagsoup-friends@yahoogroups.com ...
Hi, I am currently using TagSoup 1.0.4. I having problem in the XML result tree after parsing the HTML source document. The XML result document will have...
... If you can apply XPath to your input document, then it is already well-formed XML, and TagSoup is not appropriate. The purpose of TagSoup is to process...
I'm using HTMLScanner as the first step in my shift-step experiment, and basically it's working OK (after hacking a workaround to cope with XML-style empty...
... Um, well, yeah, y'see, that's a bit of debugging logic that didn't get commented out. It's probably been in there because I haven't revisited PYX output...
Hi John, I've been putting TagSoup 1.0.4 through a couple of fringe cases and ... I can crash the parser by feeding it the following character sequences: this...
... Arrgh. I've added this to TODO to fix when I get a chance. ... This is probably part of the same problem; I need to re-engineer this part of the html.stml...
Hello TagSoup Friends, I am new to tag soup. Can someone point me to getting started docs? or some code snippets on how to use TagSoup to parse HTML into DOM? ...
... It depends on which DOM implementation you are using. With the exception of Xerces (which has a built-in parser), most DOMs allow you to specify the SAX...
... I've added the patch, though without the long comment, to the prerelase version. Note that UTF-16 HTML will also be handled properly with this patch. -- ...
This is yet another bug-fix release. The main issue was with HTML comments, which were very badly broken -- any > character would terminate one, so commenting...
Thanks for the update, John! (Sorry, my original email to you didn't get cc:'ed here properly because of Yahoo account setup issues on my part.) Regarding the...
TagSoup 1.1 is just TagSoup 1.0.5 with Tatu Saloranta's JAXP package added. (ERH can start gagging now.) It doesn't affect the things you could do with...
... Hi, I thought I'd better let you know that the http://home.ccil.org/~cowan/XML/tagsoup/tagsoup-1.1-src.zip archive contains ".svn" directories. I'm not...
... Not only am I going to gag. I'm going to claim it's buggy and actively dangerous as implemented. The instructions say: "To use TagSoup within the JAXP...
Elliotte Harold
elharo@...
Mar 25, 2007 3:48 pm
731
... Dangerous? It's not a security hole, after all. ... Very true. ... I quite agree; I wouldn't dream of doing anything else myself. ... Not gonna do that....
... Yes. While wording may not be optimal (it is incomplete), but this is how JAXP is supposed to work. If developer chooses to do that, so be it. Whether JAXP...
... In that case, you should probably update the web page to suggest this approach rather than setting the system property. ... A need for a SAXParser...
Elliotte Harold
elharo@...
Mar 25, 2007 6:10 pm
734
... JAXP is broken by design and I frequently recommend against it. ... I think it makes a great deal of sense. What is the actual use case being addressed...
Elliotte Harold
elharo@...
Mar 25, 2007 6:14 pm
735
... Done. ... /me shrugs. -- John Cowan http://ccil.org/~cowan cowan@... Economists were put on this planet to make astrologers look good. --Leo...
... That's good advice if you are writing an application or defining your own framework, but not if you are writing to conform to an externally defined...
... The problem is TagSoup doesn't conform to this externally defined specification, even if the interfaces all match. The specification for SAXParserFactory...
Elliotte Harold
elharo@...
Mar 25, 2007 7:03 pm
738
... No contest there, it is pretty flawed as APIs go. ... Since I didn't need this originally myself, I can not definitely say whether one or both are needed,...