Currently TagSoup's behavior about entity references is as follows. If an entity is recognized by the schema, such as , it is turned into a single...
... I wouldn't do that, as for instance you do see people using for instance é in alt attributes. You could restrict that behaviour to href attributes...
Robin Berjon
robin.berjon@...
Feb 12, 2004 10:52 am
21
JC: Clearly this can be fixed by being smart about not inserting ; when the entity reference is unknown. But I'm wondering if it wouldn't just be better to...
... The SGML behavior could be something to consider. This is off the top of my head, and probably not exactly correct, but I believe an SGML parser that finds...
... I do that too. But unlike an SGML parser, I can't just cough and die in either of the two bad cases: unknown entity and missing semicolon. Too many HTML...
I just got an off-list request to add support for HTML comments through the LexicalHandler interface. I wonder if anyone else thinks this feature is useful. ...
Hi, We have a project for a national archive to translate data into standard formats for long term archiving. One of these formats is HTML. Whilst we will keep...
I 'll defer to John on the other questions... What does it mean "it does not convert presentation HTML to CSS"? I believe that means in cases like: ...
... Forgive my ignorance, but is the latter valid xhtml? If so, why would anybody want to change it? <center> was deprecated in HTML 4.01, from which XHTML is...
I'm not sure about JTidy, but the exe version of Tidy has an option - I just tried this : <center>text</center> Checking the "Output as XHTML" and "Replace...
... One possibility would be to use TagSoup as a prefilter for JTidy. The main danger is that TagSoup will mess up what JTidy would understand correctly,...
... Correct. ... In fact, the HTML 4.01 DTD says that a center element can contain another one, so this is left alone, and two end-tags get added at the next ...
... Tidy can cope with mailformed start-tags better than TagSoup currently can; on occasion, TagSoup gets terminally confused about what's an attribute and...
... A quick and dirty hack is to add the following after line 343 of Parser.java (the call on theSchema.getElementType): if (type == null) return; As I say,...
--On Saturday, February 21, 2004 11:40 AM +1100 "Chris B." <chris@...> wrote:r ... I've actually tried doing this (tagsoup as prefilter before jtidy)....
FWIW I've found a really good use for TagSoup. I don't know if this is at all novel or what, but I'm writing an online tutorial on XQuery, and I'm using...
Howard Katz
howardk@...
Feb 23, 2004 6:36 pm
40
... Fabulous! Can you mention, at least, what XQuery implementation you are using, and how you are persuading it to parse with TagSoup? -- John Cowan...
Sure, I'm using my own engine (who else's?! :-) The exercise has also helped me uncover some new bugs in my implementation. All it took to persuade my engine...
Howard Katz
howardk@...
Feb 23, 2004 8:57 pm
42
TagSoup maintains a stack of open elements, and knows which elements can be children of which. When a start-tag is found that can't be a child of the...
I have just released TagSoup 0.9.2 and TSaxon 0.9.2. The changes to TagSoup: No longer inserts bogus ; after unknown entity reference without ; Consecutive...
I've uploaded a test suite for TagSoup to the files area. (It's MIT-licensed, which should be compatible with everything else.) JUnit tests are included, as...
... Thank you! I'll move it to the TagSoup page on Monday, and probably incorporate it into the next release. ... Hmm, yes. I suppose it should generate...
Hi , I have tested tagsoup parser on http://www.yahoo.fr and i was really surpised on the events on the html content : in the first script tag content you can...
... I'd like to see these changes and possibly incorporate them into the next release. Can you send them to me, please? -- "While staying with the Asonu, I...
Well, it's time for another public release of TagSoup, the SAX-compliant Java parser for nasty, ugly HTML. TagSoup 0.9.3 fixes most known bugs and provides ...