... No. TagSoup interprets entity references on iput, but does not regenerate them on output. But if you set the output encoding to something other than...
I feel like I've seen this discussed at some point in the past 5 years, but I can't remember or find the answer. If an HTML page has an ampersand in the text,...
... Yes, it should be handled (and returned as a raw &, to be escaped on output as &). ... @#$*, I thought I got rid of that class of bug. Apparently the...
Hi all, What is the best way to unit test the parser methods like startElement(), endElement(), ... one at a time, and by starting from reading an XML file...
... You got me there. Parsing is inherently a tightly coupled group of behaviors, since everything depends on building up a rather complex and varying state. ...
... Almost by definition unit testing doesn't read files. Passing your own arguments is the right way to *unit* test. That said, it is important to test with...
Elliotte Harold
elharo@...
Feb 16, 2009 2:42 pm
1254
Thank you for your answer. Your proposal tends to indicate that we need to go for an intrusive solution in which we modify the real code to throw exceptions...
Yes I agree and that is what I am doing for the time being. I don't read files but I get my test input from unit test strings. BR, CP. ... from ... own ... ...
... Without more context, I simply can't say. -- John Cowan cowan@... http://ccil.org/~cowan The penguin geeks is happy / As under the waves they lark ...
With TSaxon the -H switch allows one to process (ill formed) HTML files when they are the source. What about when the source file is XML and you're trying to...
... I don't know any way to do that. The -H switch is just shorthand for the Saxon switch '-x org.ccil.cowan.tagsoup.Parser', and that affects both the main...
I want to use Tagsoup to process a html page (a malformed one) and i got it to work using the comand line -H flag. However when i tried it in code, following...
As a followup: I ended up having to pass the output from tagSoup v1.2 into a build of htmlTidy in order to get it to parse in TinyXML for certain html samples...
... Looks like TinyXML is not a conforming XML parser, if it doesn't understand character references. To get UTF-8 output without entities, though, just...
... Erm, I hate to be slightly rude, but haven't we had the conversation about the command line problems re: output encodings and win32? I started this whole...
The documentation for XMLWriter says * <p>According to the XML Recommendation, <em>all</em> whitespace * in an XML document is potentially significant to an...
... If you look at the Infoset, you'll see that whitespace outside the root element is generally considered nonsignificant, despite the letter of the XML Rec....
... [mailto:tagsoup-friends@yahoogroups.com] On Behalf Of John Cowan ... TagSoup 1.2 ... whitespace ... root ... question. John, Thank you for your quick...
... Sorry, quite right. Since I don't use Windows, I have no idea why the output encoding is broken (if that's really what's happening). Can someone using...
I have found a 1811 line xhtml file with unbalanced tags that causes Tagsoup to go into a loop. Is there a procedure for reporting such problems? regards, tom...
... xmlns="http://www.w3.org/1999/xhtml"><body><b>hello</b><i>there</i> ... elementLevel == 1, ... thus ... I misspoke in the quoted text at the top of this...
... Yes, you're right. I've never paid attention to this before. ... Aha. ... I agree: line 632 should just be flushed. -- Clear? Huh! Why a four-year-old...
Thanks! And to correct another typo for the record, I'm sending fragments, not fragmenents, which I guess is a back-formation from documenents. Leigh. ... ...
Hi, In a proyect where we use Tagsoup to tidy some malformed xhtml code have found that if there is an odd number of quotes on the doctype declaration tagsoup...
Miguel Garcia
miguel.garcia@...
Mar 18, 2009 10:58 am
1276
... The real problem is that TagSoup thinks the system-id begins with a quote and ends with a quote, but doesn't realize that it's zero-length. The obvious...