Hi,
I've encountered an issue using TagSoup and I wanted to clarify whether it is
expected behaviour due to how I'm using it, or something else.
The issue that I'm seeing is that I'm parsing an RSS feed and it eventually goes
through TagSoup to ensure that I store well-formed XML.
http://www.guardian.co.uk/football/2009/feb/26/real-madrid-rafa-benitez-liverpoo\
l/rss
The <br/> element between the first two bullet points in that story is getting
removed when I parse the <item/> description and I'm not sure why that is the
case.
"<p>• Liverpool manager says he will be staying at Anfield<br />•
Spaniard praises team for win away to Real Madrid</p>"
The markup is being correctly unescaped prior to being passed to TagSoup.
Is there a source repository that I can check out anonymously and write some
tests against? I've not been able to find one through Google - too much
interference from the Haskell version, etc.
Cheers,
James