Mike Bremford sent me a patch that causes TagSoup to send the system and public IDs to the LexicalHandler if there is a DOCTYPE declaration present in the...
Unless I hear from someone with a big juicy patch or a nasty bug, I'm going to turn TagSoup 1.0rc7 into TagSoup 1.0. So please test, test, test! Or if you...
Hi, When I use TagSoup to parse html which contains empty tags (elements with no text value), I get unexpected output. For example, when I parse the following...
... That's a bug all right. Which version of TagSoup are you using? -- John Cowan cowan@... http://www.ccil.org/~cowan O beautiful for patriot's dream...
Hi, I was trying to compile (as delivered) from source the TagSoup 1.0rc7 package, and was unable to do it. There weren't 3 variables of "public void decl(char...
This fixes a paper-bag bug that made it impossible to compile the jar-file from the released sources. I added a few bits of defensive programming as well, but...
a long time ago, when I was using Yacc, I had the possibility to tell Yacc to trace all its shift/reduce actions so that I can trace what it was doing and find...
... There isn't at present. 1.0rc8 provides this facility at the scanner level (the analogue of Lex) but not at the parser level. If anyone wants to work out...
... You can use XSLT to filter out things you don't like. The reason it's supplied is that TagSoup is an SGML parser (although a very unusual one), and SGML...
I updated to the rc8 from rc6 and I'm getting the message "Cannot have a public ID without a system ID". Everything worked fine under rc6. Google search leads...
... Versions before rc7 didn't report any publicids or systemids, and apparently the NUX layer is upset because HTML documents can have publicids without...
As noted in my last posting, rc9 differs from rc8 only in returning a systemid of "" if there is a publicid but no systemid in the DOCTYPE declaration. -- John...
Hello everybody! I plan to use TagSoup to parse html but I would like to modify as less as possible the html code because it's generated by an editor and the ...
... I'm not sure what you mean by "modifying it". There is no need to modify the HTML before TagSoup gets it. What TagSoup returns is of course modified,...
Hello! Thank you for your answer. I will try to explain better my case. I would like to process the code generated by the editor (that is bad-formed HTML, for...
... This is exactly what TagSoup does. It does not "modify" its input in the sense of changing it to be something else. ... Exactly right. ... I meant simply...
Another small change: There is a switch --norestart to prevent restartable elements from being restarted. This is the end of my current plans for TagSoup. I...
Leigh Dodds provides an on-line TagSoup service; see http://xmlarmyknife.org/docs/xhtml/tagsoup/ for details. -- Your worships will perhaps be thinking...
For some reason, zip was generating corrupt jar files, and JDK 1.4's jar was hanging on my build system. GNU fastjar, which is /usr/bin/jar on my system, is...
I'm parsing some html and I appear to be losing some whitespace in the output. My input is: <html>safe<link rel="stylesheet" type="text/css"...
David Pashley
david@...
Jun 23, 2006 10:57 pm
546
... Neither. In SGML (and HTML is an SGML application), whitespace within an element, explicit or implicit, that is known to allow only child elements and not...
... I would quite like this data. Could this be reported via ignoreableWhitespace? Would you add a patch which did this? Does tagsoup ever call that callback?...
David Pashley
david@...
Jun 25, 2006 8:08 am
548
... Okay, that was quite simple. I can't find any calls to ignorableWhitespace, so I'm assuming that calling it won't break anything. I could be wrong, so I'll...