Sam Ruby recently created some aggregator test cases for Atom, which
he also ported to RSS 1.0 and RSS 2.0 [1]. Not surprisingly, it
showed up some failing tests for aggregators using RSS 1.0 as well.
In these particular tests, Sam is using utf-16 encoded feeds with the
word Internationalization spelled using a variety of international
characters and character references. In the Atom tests, Sam is using
an Atom feature that allows entries to include escaped HTML markup in
the <title> and <content> elements. In porting these to RSS, however,
Sam kept the test cases that included escaped HTML markup, which RSS
1.0 does not allow for in <title> and <description> (the Content
module does support it in content:encoded, however).
It's not the misunderstanding of escaped markup in the tests that's an
issue, but the rendering of the tests by aggregators and the
misunderstanding of the results. For several aggregators that were
tested using RSS 1.0, the testers indicated that the aggregator
"passed" if it rendered the "expected" international characters, even
in the escaped HTML markup.
The correct results would be for the characters of the escaped HTML
markup to appear literally to the user/tester. For example, this
title in Atom, not using the escaped-HTML feature:
<atom:title>An accented character: &eacute;</atom:title>
and this title in RSS 1.0:
<rss1:title>An accented character: &eacute;</rss1:title>
should appear to the user as:
An accented character: é
for any readers whose display isn't getting that, that's the literal
characters "& e a c u t e ;".
What to do about this? Unfortunately, it is likely that this is a
widespread error.
The first step would be to file a bug report, supply test cases, and
patches for the Feed Validator to provide a warning for this issue
(it's not an error, as the characters themselves are valid, but users
might be unaware that they should render literally).
The next step would be to expand on Sam's tests with RSS 1.0 test
feeds, in particular using his earlier UTF-8 tests which weren't
ported to RSS 1.0.
With the feed validator fixed and a suite of tests, we could then
begin guiding aggregators and producers in correcting their feeds.
Thoughts?
-- Ken
[1] http://www.intertwingly.net/blog/2004/06/03/Aggregator-utf-16-tests