I've loaded a page and used SAX2DOM to create a DOM tree. I then used XPathAPI.selectSingleNode to get a starting point and traversed the subtree. Curiously,...
... Not without something to work from. I need the input page and some information on what XPaths returned what, or a dump of the DOM generated by SAX2DOM as...
hi, I would like to make TagSoup bling to user tags. for examples in the folowing html, I would like it to simple ignore (AKA be blind to) the <tag> tag in the...
... on out. ... And these many months later I finally remember this conversation and do something about it: http://jira.codehaus.org/browse/MAVENUPLOAD-1127 ...
Hello, I'm trying to make the handling of < characters more forgiving. By default a < surrounded by space seems to get converted to a < which is good. But...
... Just to explain this output, I'm pretty much just outputting XML as it comes through, so basically TagSoup is interpretting <- as the start of a tag called...
... Fair enough. It's really, really hard for the code to decide which uses of < are plausible tags or other things and which are not, since it proceeds like...
... No, not in xml (it is legal after first char though) ... I guess so, since underscore is legal as the first name char. On the other hand, all HTML tags...
Hi - we're using TagSoup happily with the Xalan XSLT replacement, and we're wondering about the bug that makes the default version not work correctly... Is it...
... The bug is about building, not about using; TagSoup doesn't do any XSLT at run time. As for why the XSLT building transform doesn't work with the default...
Hello, I just started using tagsoup so I don't know of this is normal behavior or a bug or wrong arguments. I'm using the version tagsoup-1.0.1.jar and here is...
... I admit that's not very good, but it's not clear what general method would be better. Currently TagSoup assumes that "0 CELLPADDING=" is the value of the...
Hi there! We are using TagSoup for our Web crawler, and we found for the page at http://www.borngayprocon.org/ TagSoup consider <!-[if IE]> as a comment, and ...
Eugeny N Dzhurinsky
bofh@...
Dec 7, 2006 9:10 am
590
Hi! I have recently come across TagSoup and want to see whether I can use it instead of JTidy. I need t be able to clean up HTML documents in a wide range of ...
... That is because TagSoup does not know which characters can be safely written to which encodings, so it plays safe and uses character references for all...
I brought up conditional IE comments a while back. I showed using some pathological examples of IE conditionals that it's impossible to proper SAX events if...
... Quite so. But there is a bug involving comments that lack the second minus sign: <!-foo--> causes TagSoup to malfunction. -- John Cowan cowan@......
Hello! We faced another problem - when parsing a HTML document, which contains the link like <a href="something.php?param=value&cap=anothervalue">, the &cap is...
Eugeny N Dzhurinsky
bofh@...
Dec 15, 2006 8:15 am
597
... I haven't had a chance to evaluate it yet. -- Well, I have news for our current leaders John Cowan and the leaders of tomorrow: the Bill of...
... Because people do in fact often leave the final semicolon off. Browsers know which attributes contain URIs and apparently don't expand entity references in...
... You're probably aware of this, but it's slightly more complicated than that (when isn't it?) because: 1. The semicolon may be omitted under certain...
Nick Fitzsimons
nick@...
Dec 15, 2006 4:16 pm
610
In command line mode with TagSoup I see this: $ java -jar tagsoup-1.0.1.jar vm.html src: vm.html <?xml version="1.0" standalone="yes"?> <html...
Elliotte Harold
elharo@...
Jan 10, 2007 3:30 pm
611
... The default output of TagSoup is XML, not HTML. If you want HTML, use the --html option; you can also turn off the XML declaration separately. ... There...
I'm trying to build TagSoup from source in Eclipse. Eclipse complains about a missing HTMLSchema class. Is this the known problem with building under 1.5, or...