Hi, I am trying to use TagSoup with JDom and it seems to be working for everything except CData within Javascript. JDom throws an exception as displayed below....
... JDom is right, and that's a bug. ... It's actually correct for TagSoup to return CDATA start and end markup as text, because in script and style elements...
Thank you, I have removed the lines and it now works just fine. I could not apply the patch automatically as it seems to be distorted by Yahoo webmail... Thank...
Uploaded a test tarball where I read 2 files w/ document('test/ filename.html'). I get an exception on the second file access. The files are pretty trivial...
... Forgot to give the Java / tagsoup details: Saxon version and stats: Saxon 8.9J from Saxonica Java version 1.4.2-03 Stylesheet compilation time: 367...
DeXSS [1] provides a SAX2 Parser to help protect against cross-site scripting (XSS) attacks. DeXSS uses the TagSoup to parse potentially mal-formed input,...
The w3c validator doesn't like xmlns:html="....." as an attribute of the HTML node. It only loves xmlns="......." It's not hard to fix in a JDOM tree built...
... Looks like a known validator bug: <http://www.w3.org/Bugs/Public/show_bug.cgi?id=800> Regards, Nick. -- Nick Fitzsimons http://www.nickfitz.co.uk/...
Nick Fitzsimons
nick@...
May 2, 2007 5:13 pm
818
This release fixes the reporting of CDATA sections. In TagSoup 1.1 and previous versions, if you specified a SAX LexicalHandler to receive indications of...
I forgot to mention that there is now a CDATAElements SAX feature, which is the programmatic equivalent of the --nocdata switch. If you set the feature to...
Okay, okay, 1.1.1 had a paper-bag bug: you could set the CDATAElementsFeature, but it had no actual effect. My current test set didn't catch this problem,...
I posted this once for help, but didn't hear back. I did get a response from Dr. Kay on the sf.net project forum. It is a bug in tagsoup exposed by recent...
... I posted this to the forum: Okay, I've nailed the problem. The NullPointerException arises because the value of the instance variable that holds the entity...
All, This is maybe a naive question. I haven't tried TagSoup yet, but I am looking for an alternative to JTidy which would support XML tags embedded within tag...
... TagSoup does not understand namespace declarations, so your XForms tags will come out with a namespace of "urn:prefix:xforms". You can easily fix that...
... Thanks. Would namespace mappings like xmlns:xforms="..." be swallowed by TagSoup? If not, then even if TagSoup doesn't undestand them it may be possible to...
Hi all, First, apologies if these have been asked before; I scanned the archives but couldn't find any references. I've been doing some experiments comparing...
I was checking out version 1.1.3 vs. 1.0rc3 (the original version I had), to see if any of the issues in my previous email might have been resolved. I found...
... Yes, that's a very subtle bug in the scanner. If the line endings are CR+LF (Windows mode) and the end-of-comment appears just after a line ending, the...
... True. TagSoup doesn't know which attributes are URIs, but worse yet it eliminates all entity references at a low level of the code (though it's the high...
All versions of TagSoup are now, by the wave of my magic wand, licensed under the Apache 2.0 license as well as the Academic Free License 3.0 and the GNU GPL...
I was using tagsoup to parse some html and kept getting funny errors and I finally realized that it was attempting to parse the tags within comments as well....