As a New Year's present to the TagSoup community (and to fulfill a pre-New-Year resolution of mine), I've completed development work on TagSoup 1.2. This is...
There are a great many changes, most of them fixes for long-standing bugs, in this release. Only the most important are listed here; for the rest, see the...
... Thanks. -- John Cowan cowan@... http://ccil.org/~cowan Female celebrity stalker, on a hot morning in Cairo: "Imagine, Colonel Lawrence, ninety-two...
Hello, My program take a string from database who contain something like : "<p>l’eau est froide</p>". the ’ entity is '. Into my SAX parser, I...
... There is, but can you send me the input that provokes this crash? ... You need to build with Ant after installing Saxon, as noted on the source page. Just...
Hi, I want to use tagsoup for parsing HTML that contains some custom XML tags in the <head> section. As far I understood the documentation, I have to add my...
... That's correct. What I don't know from your example is whether you guarantee that the XML parts are always going to be well-formed and valid internally,...
Wow, that was a lot of information, thank you! ... TagSoup ... I read that pdf file already, but I found nothing about the groups and their intentions in...
... You must be using some kind of output engine other than the supplied XMLWriter, I guess. I don't know what these are symptoms of. ... This is an excellent...
Hi everyone, First, thanks to John and others for your excellent work on TagSoup, it is one of those tools I find very useful, very often. I know this may be...
... Thank you! ... If the invalid structure is only in the HTML parts, and the RSS parts are basically well-structured, then TagSoup 1.2 should simply do the...
... Yep, I do realize I could do this. Since, like you mentioned, the RSS/Atom structure is quite extensible, I thought it'd be great to have a single pass ...
As I read the web page, setting the namespaces feature to false should result in elements in the default namespace. Instead, I get elements in the xhtml...
A company called JezUK has released Taggle, which is a straight port of TagSoup 1.2 to C++, as part of Arabica, a C++ XML toolkit providing SAX, DOM, XPath,...
Hi John, everyone,
Version 1.2 of TagSoup occasionally throws an exception when trying to push back data to the internal PushbackReader. Examples of failing...
... Thank you very much, especially for the failing input. There was an earlier bug report to this effect, but no examples were forthcoming. ... It should...
... My pleasure. This issue can be avoided by passing custom PushbackReader on the input. See my other e-mail about nested tags, I think that one can be...
... Problem solved! The issue arises when an & appears at the end of a line, and the line terminator is either-LF (Windows) or CR alone (Mac Classic), as in...
Hi John and tagsoup-friends, would it be possible to briefly describe (or provide reliable pointers to) a way to create an instance of...
Godmar Back
godmar@...
Feb 6, 2008 4:02 am
1003
... I don't know of any HTML DOMs that have pluggable parsers, since there is no standard interface for streaming HTML parsers. Most people use XML DOMs or...
... I'll investigate that. I was intrigued by your suggestion in your 2002 talk that "SAX-to-DOM converters" were abundant; apparently, this doesn't include...
Godmar Back
godmar@...
Feb 6, 2008 5:53 am
1005
... SAX is purely an XML standard, unless you are using an HTML-to-SAX parser like Cyberneko, TagSoup, or JTidy. ... The HTML DOM doesn't really buy you much...
Confirmed, works as advertised -- thanks John. For some reason my other bug report didn't get through to the list. I will try to re-send it again to start ...
Another bug, this time more serious and with no apparent workaround (sorry, John). Try to run: java -jar tagsoup-1.2.jar error-67.txt > out on the ZIPped HTML...