In command line mode with TagSoup I see this: $ java -jar tagsoup-1.0.1.jar vm.html src: vm.html <?xml version="1.0" standalone="yes"?> <html...
Elliotte Harold
elharo@...
Jan 10, 2007 3:30 pm
611
... The default output of TagSoup is XML, not HTML. If you want HTML, use the --html option; you can also turn off the XML declaration separately. ... There...
I'm trying to build TagSoup from source in Eclipse. Eclipse complains about a missing HTMLSchema class. Is this the known problem with building under 1.5, or...
Elliotte Harold
elharo@...
Jan 10, 2007 4:06 pm
613
... That's a generated class. You need to do "ant prepare-parser", which will fail on stock 1.5, so you need to fall back to 1.4 or install a working XSLT...
... You're right. I don't think I'd ever noticed that attribute before. Still I'm not sure it should be in the output for a couple of reasons: 1. The attribute...
Elliotte Harold
elharo@...
Jan 10, 2007 4:20 pm
615
... I think I've set ant up to use 1.4 but Ant is still giving me messages about "java.lang.ClassNotFoundException: ...
Elliotte Harold
elharo@...
Jan 10, 2007 4:42 pm
616
... I have no clue. Try upgrading Xalan in the endorsed directory instead. -- Business before pleasure, if not too bloomering long before. --Nicholas van Rijn ...
... I'm also getting some Ant deprecation messages. I'm using Ant 1.6.5. Which version of Ant are you using? -- Elliotte Rusty Harold...
Elliotte Harold
elharo@...
Jan 10, 2007 5:06 pm
618
... You've convinced me. I've removed it from the trunk, and you can do so too by just pulling the line with /name='version'/ out of src/definitions/html.tssl...
... Possibly the problems are related then. Here's what I see: ~/projects/tagsoup-1.0.1$ ant compile Buildfile: build.xml [available] DEPRECATED - <available>...
Elliotte Harold
elharo@...
Jan 10, 2007 6:52 pm
621
FWIW, Ant 1.7 was able to compile TagSoup though it still gave deprecation messages: This was under Java 1.5 by the way. Is the problem with the buggy 1.5 ...
Elliotte Harold
elharo@...
Jan 10, 2007 11:21 pm
622
... Once the code is generated, it should compile fine and run fine. The XSLT is used only to generate the tables in the classes in src/templates. -- There...
... So what's the symptom of the bug that prevents the code from being compiled under 1.5? At first glance it seems to have compiled OK for me, but I may be...
Elliotte Harold
elharo@...
Jan 10, 2007 11:57 pm
624
... The inability to do the XSLT build due to the bad version of Xalan distributed with stock 1.5: $ ant Buildfile: build.xml init: prepare: [mkdir] Created...
I'm a novice to Java and XML, and I would like to use TagSoup with JAXP, since JAXP supports XPath 2.0. It is my understanding that DOM parsers use SAX parsers...
... Unfortunately I have never used JAXP. Some parsers expose both SAX and DOM, notably Xerces; I don't know which packages, if any, allow pluggable SAX. I...
... Yes. If you must use xpath 2.0, that makes sense. Otherwise alternatives (like XOM [xpath 1.0 using Jaxen] + TagSoup) are (IMO) a superior choice. ... This...
... TagSoup does not have JAXP interfaces, but I very much invite a patch from anyone who has written one. I don't have time at the minute to figure out how...
... Go for it. That'll force me to put out a new release with the couple of patches I've accumulated. -- First known example of political correctness: John...
... Ok, here goes... I have only briefly tested it, but it seems to work via Jaxp SAXBuilderFactory. The only requirement for use is to define alternate...
I'm trying to enumerate the limits on what TagSoup can fix. I.e. assuming I want *valid* XHTML output after TagSoup is through, what do I still have to look...
Elliotte Harold
elharo@...
Jan 15, 2007 5:32 pm
634
When running TagSoup from the command line via "java -jar tagsoup-1.0.1.jar" what is the output encoding? UTF-8? Something else? Would it be possible to add a...
Elliotte Harold
elharo@...
Jan 15, 2007 5:33 pm
635
... UTF-8. ... It would be messy. I'll add it to the documentation instead. -- Do I contradict myself? John Cowan Very well then, I...
... Here's my best shot at a general answer: Insert a DOCTYPE declaration. Ensure that all required elements ("html", "head", "title", and "body") are present....
I have a file that I have created to be deliberately poorly formatted and I'm trying to get a 'fixed' version of the HTML so that I can convert it to DOM and...
Hi, Are there javadocs about tagsoup somewhere? I don't find it anywhere (not in source, not on the web). Or did I miss something very obvious? Thanks raph...
... You can generate them yourself by saying "ant docs". However, TagSoup does not have any unique API; it provides only the SAX API, which you can read about...
In doing a little more research on this, I understand now that TagSoup does not modify the "input," which is fine, but I do need a way to get the modified...