... on out. ... And these many months later I finally remember this conversation and do something about it: http://jira.codehaus.org/browse/MAVENUPLOAD-1127 ...
Hello, I'm trying to make the handling of < characters more forgiving. By default a < surrounded by space seems to get converted to a < which is good. But...
... Just to explain this output, I'm pretty much just outputting XML as it comes through, so basically TagSoup is interpretting <- as the start of a tag called...
... Fair enough. It's really, really hard for the code to decide which uses of < are plausible tags or other things and which are not, since it proceeds like...
... No, not in xml (it is legal after first char though) ... I guess so, since underscore is legal as the first name char. On the other hand, all HTML tags...
Hi - we're using TagSoup happily with the Xalan XSLT replacement, and we're wondering about the bug that makes the default version not work correctly... Is it...
... The bug is about building, not about using; TagSoup doesn't do any XSLT at run time. As for why the XSLT building transform doesn't work with the default...
Hello, I just started using tagsoup so I don't know of this is normal behavior or a bug or wrong arguments. I'm using the version tagsoup-1.0.1.jar and here is...
... I admit that's not very good, but it's not clear what general method would be better. Currently TagSoup assumes that "0 CELLPADDING=" is the value of the...
Hi there! We are using TagSoup for our Web crawler, and we found for the page at http://www.borngayprocon.org/ TagSoup consider <!-[if IE]> as a comment, and ...
Eugeny N Dzhurinsky
bofh@...
Dec 7, 2006 9:10 am
590
Hi! I have recently come across TagSoup and want to see whether I can use it instead of JTidy. I need t be able to clean up HTML documents in a wide range of ...
... That is because TagSoup does not know which characters can be safely written to which encodings, so it plays safe and uses character references for all...
I brought up conditional IE comments a while back. I showed using some pathological examples of IE conditionals that it's impossible to proper SAX events if...
... Quite so. But there is a bug involving comments that lack the second minus sign: <!-foo--> causes TagSoup to malfunction. -- John Cowan cowan@......
Hello! We faced another problem - when parsing a HTML document, which contains the link like <a href="something.php?param=value&cap=anothervalue">, the &cap is...
Eugeny N Dzhurinsky
bofh@...
Dec 15, 2006 8:15 am
597
... I haven't had a chance to evaluate it yet. -- Well, I have news for our current leaders John Cowan and the leaders of tomorrow: the Bill of...
... Because people do in fact often leave the final semicolon off. Browsers know which attributes contain URIs and apparently don't expand entity references in...
... You're probably aware of this, but it's slightly more complicated than that (when isn't it?) because: 1. The semicolon may be omitted under certain...
Nick Fitzsimons
nick@...
Dec 15, 2006 4:16 pm
610
In command line mode with TagSoup I see this: $ java -jar tagsoup-1.0.1.jar vm.html src: vm.html <?xml version="1.0" standalone="yes"?> <html...
Elliotte Harold
elharo@...
Jan 10, 2007 3:30 pm
611
... The default output of TagSoup is XML, not HTML. If you want HTML, use the --html option; you can also turn off the XML declaration separately. ... There...
I'm trying to build TagSoup from source in Eclipse. Eclipse complains about a missing HTMLSchema class. Is this the known problem with building under 1.5, or...
Elliotte Harold
elharo@...
Jan 10, 2007 4:06 pm
613
... That's a generated class. You need to do "ant prepare-parser", which will fail on stock 1.5, so you need to fall back to 1.4 or install a working XSLT...
... You're right. I don't think I'd ever noticed that attribute before. Still I'm not sure it should be in the output for a couple of reasons: 1. The attribute...
Elliotte Harold
elharo@...
Jan 10, 2007 4:20 pm
615
... I think I've set ant up to use 1.4 but Ant is still giving me messages about "java.lang.ClassNotFoundException: ...
Elliotte Harold
elharo@...
Jan 10, 2007 4:42 pm
616
... I have no clue. Try upgrading Xalan in the endorsed directory instead. -- Business before pleasure, if not too bloomering long before. --Nicholas van Rijn ...