Hi,
In a proyect where we use Tagsoup to tidy some malformed xhtml code have
found that if there is an odd number of quotes on the doctype
declaration tagsoup throws an String related exception and fails. For
example with the following input,
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.1//EN" "> <html
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head><title>Test
with bogus doctype</title></head> <body> <p>This page has an extra quote
in the doctype, which the tagsoup library doesn't like.</p> </body>
</html>
Tagsoup throws the next exception,
[Fatal Error] :2:14: The document type declaration for root element type
"html" must end with '>'.
Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
String index out of range: -1
Not sure if making a patch to this library would be quite easy (I
haven't reviewed the source code yet) or should it better just making
some workarounds that help to recover from any unexpected error from
tagsoup.
Miguel