I seem to have found another place where TagSoup gets in a bit of a
huff. Perhaps there is a flag I can specify to make things better.
The issue is this piece of HTML (the less than sign is, erroneously,
straight up (i.e. not using an entity)):
<em><90 min</em>
and it is occurring on this page - http://tinyurl.com/dgmjjt
On first inspection it seems like TS makes some sort of sense out of it:
<em><_90 min="min" em="em">. </_90></em>
But then it starts inserting <em></em> all over the document (this was
the only "em" in the original doc). And then one of those inserted ones
doesn't have a matching end tag (which is how I stumbled upon this when
it hit the XML parser). Any simple resolution?
-Mike