John I followed your advice and subscribed to the TagSoup list. As I was saying, many popular sites have lots of nested JavaScript in their HTML and TagSoup...
... This definitely seems to be a result of the known problem with detected end-tags in script and style elements. -- John Cowan www.ccil.org/~cowan...
... It sounds like you'd be better off with jchardet, a Java port of the Mozilla encoding guesser. Its result can be set into the InputSource object you pass...
Hello, TagSoup + XOM here. I get an error somewhere deep in my XML manipulations that emerges as a ParsingException and the message "-1" :( Unfortunately, I...
There seems to be an amazing number of pages out there with multiple body tags! I guess this comes from people doing includes of whole pages. It would be nice...
... That's one source. An old bug in early versions of Netscape meant that background-color attributes in multiple body tags would be interpreted dynamically,...
... I understand. I am parsing real (i.e. ugly) HTML using XOM's NodeFactory. What's the best strategy to remove those extra body tags? I tried using booleans...
... I am using XOM's NodeFactory to parse raw HTML. My problem is that I am using the body closing tag as the cue point to start collecting statistics about a...
... XSLT is your friend; so is the full use of the XOM model. You are trying to strain the limits of a streaming API beyond what's reasonable. -- John Cowan...
Finally a new release of TagSoup and TSaxon. Summary of changes: Convert CR and CRLF to LF in comments and PIs Force empty elements to close immediately Match...
Hello I know I won't be able to do a compilation under jdk 1.1.8 directly, but is the code (including the code generated from the xslt transformations)...
... Well, you'd have to go through and change references to HashMap into Hashtable, but that's all I can think of offhand. ... Thank you. -- At the end of the...
... I've posted a note to this effect to the TagSoup home page. Let me know if you want ... used instead of "thunderbearshammer". Vynne = hammer? -- John...
John, et. al., Could I request a small change to the home page, to say explicitly that you get to choose the license, AFL 2.1 or GPL? I've been told that the...
Thank you! I pressed reload and it magically appeared! ... From: tagsoup-friends@yahoogroups.com [mailto:tagsoup-friends@yahoogroups.com] On Behalf Of John...
John, Thank you. Xerox will likely be using the TagSoup JAR file binary in one of our products. This change makes it clear to our lawyers that we have the...
Once the product revision is launched, if it does include TagSoup then we will put the TagSoup attribution in the about page of the product. I will likely not...
... Okay. Since the fact will then be published (since this list is publicly archived), I'll take the liberty of mentioning it. -- Why are well-meaning...
The 1.0.2rc3 release uses null as the initial value for the options HashMap in CommandLine.java in the new command-line parsing code. Since java.util.Hashtable...
In the piddly department, here is a minimal --help. Note that it uses Iterator because I can't see a convenient way to get an Enumeration out of a HashMap, so...