Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

rss-public

The Yahoo! Groups Product Blog

Check it out!

Group Information

  • Members: 509
  • Category: XML
  • Founded: Jan 22, 2006
  • Language: English
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Messages

Advanced
Messages Help
Messages 15 - 44 of 2012   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#15 From: "Randy Morin" <randy@...>
Date: Wed Feb 1, 2006 3:39 pm
Subject: Re: Whose day should I skip?
randymorin
Send Email Send Email
 
This is likely an area where we can clear up the spec. We have
several options, of which two are...
-Specify the handling of these elements more precisely.
-Accept that it's unused by both publishers and aggregators and
deprecate it.

Randy Charles Morin

--- In rss-public@yahoogroups.com, "rcade" <rcade@y...> wrote:
>
> Going all the way back to Netscape 0.91, I can't find a spec that
> indicates the timezone to use for <day>. Because skipDays and
> skipHours have been documented together in all revisions of
2.0.1 ...
>
> http://www.rssboard.org/skip-hours-days
>
> ... I probably would have implemented day using GMT because hour
> specifies it. Does anyone know of aggregators that support
skipDays?
> I'll ask their developers what they did (and why).
>
> As we work on this, I'll post an outline of unanswered questions
that
> are still being evaluated. Thanks for the quick feedback!
>

#16 From: "Randy Morin" <randy@...>
Date: Wed Feb 1, 2006 3:46 pm
Subject: Re: Which Channel Elements Can Be Multiple?
randymorin
Send Email Send Email
 
Here's Dave's thoughts on the issue...

http://archive.scripting.com/2004/12/21#multipleenclosuresOnRssItems

...which align with Phil's and I agree with y'all. If it doesn't say
multiple, then it's singular.

Randy Charles Morin



--- In rss-public@yahoogroups.com, Phil Ringnalda
<philringnalda@g...> wrote:
> I found Dave's take from some previous discussion, probably
<enclosure>,
> completely persuasive: the only sane way to interpret having some
> elements which say that there may be more than one, and others
which do
> not say so, is that the ones which do not say they may be repeated
may
> not be repeated. There's no more reason in the spec to think that
you
> may have multiple <cloud> elements than there is to think that you
may
> have multiple <title> elements (which in my aggregator would get
them
> concatenated, in yours would give only the first, and in his would
give
> only the last, and is thus clearly not interoperable). +1 to "zero
or
> one" for anything that doesn't explicitly say it can be repeated,
> including <enclosure>.
>
> Phil Ringnalda
>

#17 From: Sam Ruby <rubys@...>
Date: Wed Feb 1, 2006 4:23 pm
Subject: Re: Re: Which Channel Elements Can Be Multiple?
sa3ruby
Send Email Send Email
 
Randy Morin wrote:
> Here's Dave's thoughts on the issue...
>
> http://archive.scripting.com/2004/12/21#multipleenclosuresOnRssItems
>
> ...which align with Phil's and I agree with y'all. If it doesn't say
> multiple, then it's singular.

You agree to what?  The original intended meaning of the spec, or that
the proposed revision could make this clearer?  Hopefully both, as the
fact that it isn't clear to Rogers as to how many cloud elements the
current spec allows per channel should be a good indication that this is
something that needs to be cleaned up.

- Sam Ruby

#18 From: Sam Ruby <rubys@...>
Date: Wed Feb 1, 2006 4:23 pm
Subject: Re: Re: Whose day should I skip?
sa3ruby
Send Email Send Email
 
Randy Morin wrote:
> This is likely an area where we can clear up the spec. We have
> several options, of which two are...
> -Specify the handling of these elements more precisely.
> -Accept that it's unused by both publishers and aggregators and
> deprecate it.

First, kudos for being willing to at least vocalize the tough decisions
that need to be made.

Proposal: a change is made to the Feed Validator to flag the use of this
item with a warning and a link to an extended [help] page.  The extended
help page describes the time zone issue and suggests that anybody who
has any known use cases or can make suggestions as to spec text do so on
this (rss-public@yahoogroups.com) list.

You can even provide the wording for this.

If no use cases and/or acceptable suggestions are made to this list in
some period of time (say, 90 days), the element is marked deprecated in
the next revision of the spec.

- Sam Ruby

#19 From: "Randy Morin" <randy@...>
Date: Wed Feb 1, 2006 4:43 pm
Subject: Re: Which Channel Elements Can Be Multiple?
randymorin
Send Email Send Email
 
The new spec should specify exact cardinality and it should be zero or
one for the channel and item optional elements that are not
specifically referred to in the plural.

Randy Charles Morin

--- In rss-public@yahoogroups.com, Sam Ruby <rubys@i...> wrote:
>
> You agree to what?

#20 From: "rcade" <rcade@...>
Date: Wed Feb 1, 2006 4:46 pm
Subject: Re: Which Channel Elements Can Be Multiple?
rcade
Send Email Send Email
 
I've updated the proposed spec to reflect that all channel and item
children except for category can't be present more than once:

In channel:

http://www.rssboard.org/rss-draft-1#element-channel

"The preceding elements must not be present more than once in a
channel, with the exception of category."

In item:

http://www.rssboard.org/rss-draft-1#element-channel-item

"The preceding elements must not be present more than once in an item,
with the exception of category."

The "preceding elements" phrase is intended to keep the spec from
restricting namespaced elements unintentionally. Let me know if you
think this resolves the issue properly.

Note: This language would impact RSS implementors who concluded that
the 2.0.1-rv-6 spec allows multiple enclosures per item.

#21 From: "Randy Morin" <randy@...>
Date: Wed Feb 1, 2006 4:45 pm
Subject: OPML for all feeds
randymorin
Send Email Send Email
 
I created an OPML file of all RSS feeds for the board members and
board Website and mailing lists. Did I miss anything?

http://www.kbcafe.com/rss/rssadvisoryboard.xml

Randy Charles Morin

#22 From: "Randy Morin" <randy@...>
Date: Wed Feb 1, 2006 5:15 pm
Subject: Re: Whose day should I skip?
randymorin
Send Email Send Email
 
Sam,
Do you already have extended help pages in the FeedValidator for
<skipX>? If so, then would you mind sharing how many pageviews you
get?
Thanks,

Randy Charles Morin

--- In rss-public@yahoogroups.com, Sam Ruby <rubys@i...> wrote:
>
> Randy Morin wrote:
> > This is likely an area where we can clear up the spec. We have
> > several options, of which two are...
> > -Specify the handling of these elements more precisely.
> > -Accept that it's unused by both publishers and aggregators and
> > deprecate it.
>
> First, kudos for being willing to at least vocalize the tough
decisions
> that need to be made.
>
> Proposal: a change is made to the Feed Validator to flag the use
of this
> item with a warning and a link to an extended [help] page.  The
extended
> help page describes the time zone issue and suggests that anybody
who
> has any known use cases or can make suggestions as to spec text do
so on
> this (rss-public@yahoogroups.com) list.
>
> You can even provide the wording for this.
>
> If no use cases and/or acceptable suggestions are made to this
list in
> some period of time (say, 90 days), the element is marked
deprecated in
> the next revision of the spec.
>
> - Sam Ruby
>

#23 From: Sam Ruby <rubys@...>
Date: Wed Feb 1, 2006 5:42 pm
Subject: Re: Re: Whose day should I skip?
sa3ruby
Send Email Send Email
 
Randy Morin wrote:
> Sam,
> Do you already have extended help pages in the FeedValidator for
> <skipX>? If so, then would you mind sharing how many pageviews you
> get?

The Feed Validator only has extended help pages for conditions that
involve warnings or errors.  Three such messages are specific to
SkipHours and/or SkipDays:

http://feedvalidator.org/docs/error/EightDaysAWeek.html
http://feedvalidator.org/docs/error/InvalidHour.html
http://feedvalidator.org/docs/error/NotEnoughHoursInTheDay.html

I keep apache logs around for 7 days. In the past seven days here is the
traffic for these pages:

EightDaysAWeek was visited 165 times, 0 times with a referer that
includes the string "check".

InvalidHour was visited 170 times, 0 times with a referer that includes
the string "check".

NotEnoughHoursInTheDay was visited 79 times, 0 times with a referer that
includes the string "check".

Conclusion: these errors are not common, and the visitors are likely all
bots.  Or, starting now, people who follow links from this email.  ;-)

- Sam Ruby

#24 From: "rcade" <rcade@...>
Date: Thu Feb 2, 2006 4:13 pm
Subject: HTML Markup in RSS Documents
rcade
Send Email Send Email
 
The history of HTML markup in Really Simple Syndication:

* 0.91 (Netscape): "We ... are not allowing any HTML markup beyond the
commonly used entities such as " A full list of these are defined
in the RSS 0.91 DTD."

* 0.92: "0.92 allows entity-encoded HTML in the <description> of an
item, to reflect actual practice by bloggers, who are often proficient
HTML coders."

* 2.0: "entity-encoded HTML is allowed"

* 2.0.1-rv 4: "entity-encoded HTML is allowed; see examples"

So HTML markup was explicitly excluded from RSS in 0.91 (Netscape),
explicitly included for the item description element in 0.92, and the
only language regarding markup since 0.92 has been to clarify how to
encode markup in description.

My read on this: RSS documents MUST NOT contain HTML markup outside of
the item description element.

Also, the only character entities permitted in other elements are
those delineated here:

http://www.w3.org/TR/REC-html32.html#dtd

Because an RSS document is valid XML, it also MUST support five
additional entities: & > < ' and ":

http://www.w3.org/TR/REC-xml/#syntax

Any thoughts? The thing I've been asked most often since I joined the
RSS Advisory Board is what to do with HTML markup and text that might
be markup in places like an item title and channel description. The
issue has driven Phil Ringnalda crazy -- in December, he began using
"<" characters in all of his item titles as a one-man campaign against
"silent data loss":

http://weblog.philringnalda.com/2005/12/18/you-can-have-my-titles-when-you-learn\
-to-behave

#25 From: "Randy Morin" <randy@...>
Date: Thu Feb 2, 2006 4:32 pm
Subject: Re: HTML Markup in RSS Documents
randymorin
Send Email Send Email
 
Just a clarification. <content:encoded> could include HTML. So could
<xhtml:body>. Better wording might be...

    RSS elements, other than item description and RSS extensions, MUST
NOT contain HTML markup.

That still might be a little confusing. Suggestions?

Randy

--- In rss-public@yahoogroups.com, "rcade" <rcade@...> wrote:
> My read on this: RSS documents MUST NOT contain HTML markup outside
of
> the item description element.

#26 From: "James Holderness" <j4_james@...>
Date: Thu Feb 2, 2006 5:52 pm
Subject: Re: HTML Markup in RSS Documents
james_holder...
Send Email Send Email
 
rcade wrote:
> My read on this: RSS documents MUST NOT contain HTML markup outside of
> the item description element.

I agree that's probably the most accurate way of interpreting the spec,
however it's worth considering the following quote from Wikipedia:

"Userland's RSS reader-generally considered as the reference
implementation-did not originally filter out HTML markup from feeds. As a
result, publishers began placing HTML markup into the titles and
descriptions of items in their RSS feeds. This behaviour has become widely
expected of readers, to the point of becoming a de facto standard."

I just did a quick check on a couple of aggregators (10). All but one
treated markup in the title as markup. So it would appear that the vast
majority of aggregators consider markup in a title valid. Adding "MUST NOT
contain HTML markup" to the spec may just result in a clear, unambiguous
spec that everyone ignores.

> Also, the only character entities permitted in other elements are
> those delineated here:
>
> http://www.w3.org/TR/REC-html32.html#dtd
>
> Because an RSS document is valid XML, it also MUST support five
> additional entities: & > < ' and ":

Are you implying that this is valid?

<description>'Résumé'</description>

or this?

<description>&apos;R&eacute;sum&eacute;&apos;</description>

The first won't make it through an XML parser without adding a DTD and the
second won't make it through an HTML renderer because apos isn't an HTML
entity. If you meant something else entirely it'll probably help if you can
provide some examples - this whole double escaping thing can be very
confusing.

> Any thoughts? The thing I've been asked most often since I joined the
> RSS Advisory Board is what to do with HTML markup and text that might
> be markup in places like an item title and channel description.

My personal opinion: just allow HTML everywhere. I suspect that's what most
publishers are doing anyway and it certainly seems to be what most
aggregators are expecting.

Regards
James

#27 From: Sam Ruby <rubys@...>
Date: Thu Feb 2, 2006 5:54 pm
Subject: Re: HTML Markup in RSS Documents
sa3ruby
Send Email Send Email
 
rcade wrote:
> The history of HTML markup in Really Simple Syndication:
>
> * 0.91 (Netscape): "We ... are not allowing any HTML markup beyond the
> commonly used entities such as " A full list of these are defined
> in the RSS 0.91 DTD."

The key words in that sentence are "defined in the RSS 0.91 DTD".  See
below.

> * 0.92: "0.92 allows entity-encoded HTML in the <description> of an
> item, to reflect actual practice by bloggers, who are often proficient
> HTML coders."
>
> * 2.0: "entity-encoded HTML is allowed"
>
> * 2.0.1-rv 4: "entity-encoded HTML is allowed; see examples"
>
> So HTML markup was explicitly excluded from RSS in 0.91 (Netscape),
> explicitly included for the item description element in 0.92, and the
> only language regarding markup since 0.92 has been to clarify how to
> encode markup in description.
>
> My read on this: RSS documents MUST NOT contain HTML markup outside of
> the item description element.

The question as to whether or not a given element in an RSS document can
contain HTML markup is a false dillema.

Ian Hickson is certainly allowed to talk about <title> tags on his
weblog.  He could create a category named <title> (complete with the
angle brackets).

The question isn't whether he is allowed to include such discussions
about HTML markup in his feed, but how such information is to be
expressed in his feed.  Without a clear answer to this question, you end
up with a number of problems.  The feed for this mailing list, for
example, will often contain snippets containing tags or elements, like
<title>.  As entity-encoded plain text.  In ways that can't be reliably
distinguished from other feeds that may include entity-encoded HTML.

The real question is: how is the consumer supposed to know the original
intent of person who composed the text?

Let's explore that with a tangible use case.  I notice that you fixed
one of the two problems I noted in the rssboard.org/rss-feed document.

In that feed, I now see: "Lo&#239;c".

How should that be interpreted?  As "Lo&#239;c"?  Or "Loïc" or
as "Loïc"?

Before you say "Of course, it should be interpreted as...", ask yourself
how I could express the following text unambiguously in a item's
description?:

   In that feed, I now see: "Lo&#239;c".

Related questions: how do I unabiguously express the name Loïc in a
<title>?  In an <author> element?

For best results, I would recommend <author>Loïc</author>.

> Also, the only character entities permitted in other elements are
> those delineated here:
>
> http://www.w3.org/TR/REC-html32.html#dtd

Full stop.

The only character entites permitted in other elements are ones defined
in the DTDs referenced by this document.  Such DTDs can be internal to
the document or external.

For NetScape's 0.91, the only named character entites which are defined
in addition to the five predefined by XML are the ones contained in

http://my.netscape.com/publish/formats/rss-0.91.dtd

As this DTD was dropped by UserLand's 0.91 none of these entities are
available to you unless you defined your own DTD, something that is
permitted by the spec, but something I wouldn't recommend.

Note: using NetScape's DTD with UserLand's 0.91 format is not
recommended as textinput is spelled differently, and image is no longer
optional in UserLand's 0.91, though this was relaxed in subsequent versions.

Using NetScape's DTD with any other version of RSS is simply a
non-starter, as none of the elements which were subsequently introduced
could be used.

> Because an RSS document is valid XML, it also MUST support five
> additional entities: & > < ' and ":
>
> http://www.w3.org/TR/REC-xml/#syntax

This is very much mixing up two different thing.

Effectively, these are the ONLY named character references that can be
directly used in an RSS 2.0 document.  At least, in singly encoded plain
text.

What can be included in entity encode HTML which is then treated as
plain text and then entity encoded (the common practice for item
descriptions?).  Well, then you can include the full set of entities
defined by HTML, which, by the way, does NOT include '.

So &iuml; is likely to be interpreted as "ï" in a description, but
as "ï" when found as the text of an author element.

> Any thoughts? The thing I've been asked most often since I joined the
> RSS Advisory Board is what to do with HTML markup and text that might
> be markup in places like an item title and channel description. The
> issue has driven Phil Ringnalda crazy -- in December, he began using
> "<" characters in all of his item titles as a one-man campaign against
> "silent data loss":
>
>
http://weblog.philringnalda.com/2005/12/18/you-can-have-my-titles-when-you-learn\
-to-behave

It is important to note that Phil is using a common ASCII character in
his titles.  He is most emphatically *not* using HTML markup.

- Sam Ruby

#28 From: "rcade" <rcade@...>
Date: Thu Feb 2, 2006 6:32 pm
Subject: Re: HTML Markup in RSS Documents
rcade
Send Email Send Email
 
--- In rss-public@yahoogroups.com, Sam Ruby <rubys@...> wrote:
> [lots of stuff on what a complex issue this is]

Is it your belief that an author of an RSS document must be able to
express the content type of an element's character data (text or HTML)
for aggregators to deal with it correctly?

#29 From: Sam Ruby <rubys@...>
Date: Thu Feb 2, 2006 6:45 pm
Subject: Re: Re: HTML Markup in RSS Documents
sa3ruby
Send Email Send Email
 
rcade wrote:
> --- In rss-public@yahoogroups.com, Sam Ruby <rubys@...> wrote:
>
>>[lots of stuff on what a complex issue this is]
>
> Is it your belief that an author of an RSS document must be able to
> express the content type of an element's character data (text or HTML)
> for aggregators to deal with it correctly?

Not at all.

From http://www.intertwingly.net/blog/2004/05/28/detente

    If the spec were to be updated to merely say how various textual
    elements SHOULD be interpreted, I would gladly update the
    feedvalidator to provide informational messages when problematic
    values for these elements are detected.

This offer is still open.

- Sam Ruby

#30 From: Sam Ruby <rubys@...>
Date: Thu Feb 2, 2006 6:45 pm
Subject: Re: Re: HTML Markup in RSS Documents
sa3ruby
Send Email Send Email
 
rcade wrote:
> --- In rss-public@yahoogroups.com, Sam Ruby <rubys@...> wrote:
>
>>[lots of stuff on what a complex issue this is]
>
> Is it your belief that an author of an RSS document must be able to
> express the content type of an element's character data (text or HTML)
> for aggregators to deal with it correctly?

Not at all.

From http://www.intertwingly.net/blog/2004/05/28/detente

    If the spec were to be updated to merely say how various textual
    elements SHOULD be interpreted, I would gladly update the
    feedvalidator to provide informational messages when problematic
    values for these elements are detected.

This offer is still open.

- Sam Ruby

#31 From: Phil Ringnalda <philringnalda@...>
Date: Thu Feb 2, 2006 7:17 pm
Subject: Re: HTML Markup in RSS Documents
philringnalda
Send Email Send Email
 
On 2/2/06, James Holderness <j4_james@...> wrote:
> I just did a quick check on a couple of aggregators (10). All but one
> treated markup in the title as markup.

I know of two which don't: Firefox (market share: ~10-20% depending)
and IE7 (market share: nearly all the rest, at some point in the
future). That's a good sized lever for shifting the rest, with a clear
spec as a place to stand.

Phil Ringnalda

#32 From: Sam Ruby <rubys@...>
Date: Thu Feb 2, 2006 9:03 pm
Subject: Re: HTML Markup in RSS Documents
sa3ruby
Send Email Send Email
 
Phil Ringnalda wrote:
> On 2/2/06, James Holderness <j4_james@...> wrote:
>
>>I just did a quick check on a couple of aggregators (10). All but one
>>treated markup in the title as markup.
>
> I know of two which don't: Firefox (market share: ~10-20% depending)
> and IE7 (market share: nearly all the rest, at some point in the
> future). That's a good sized lever for shifting the rest, with a clear
> spec as a place to stand.

As Rogers points out, careful reading of history would indicate that the
intention of the spec is that item/description is the only element which
is to be interpreted as containing entity encoded HTML, all other
elements are to be interpreted as containing entity encoded plain text.

Furthermore, I will assert that very few feeds would be affected by
this.  Yes, there would be several consumers which would have to change,
but very few feeds.

A clear spec, the two levers that Phil mentions above, a set of
conformance tests (the ones that Phil wrote for Atom could readily be
adapted), and the additional checks that the Feed Validator could make
based on this clarification, and silent data loss associated with RSS
2.0 could someday become a thing of the past.

- Sam Ruby

#33 From: "James Holderness" <j4_james@...>
Date: Fri Feb 3, 2006 1:19 am
Subject: Re: HTML Markup in RSS Documents
james_holder...
Send Email Send Email
 
Phil Ringnalda wrote:
> I know of two which don't: Firefox (market share: ~10-20% depending)
> and IE7 (market share: nearly all the rest, at some point in the
> future).

Bah. I just uninstalled IE7 earlier today. Was causing too many problems on
my system. Firefox I don't usually test. Should have known those would be
the two that disproved the rule.

Btw, where do you get your stats from? The only numbers I've seen are from
that 6 month old article listing feedburner stats [1] which has Firefox in
the 4-7% range.

> That's a good sized lever for shifting the rest, with a clear
> spec as a place to stand.

You could be right. Or you may find that IE7 changes their interpretation
when they find it causes problems for them in existing feeds. Only time will
tell.

FWIW I've had a chance to test some of the online aggregators which adds
another four to the side of "markup in a title is markup". It will be
interesting to see how many of these have changed their interpretation a
year from now.

Regards
James

[1]
http://itmanagement.earthweb.com/columns/executive_tech/article.php/3517646

#34 From: Phil Ringnalda <philringnalda@...>
Date: Fri Feb 3, 2006 1:47 am
Subject: Re: HTML Markup in RSS Documents
philringnalda
Send Email Send Email
 
On 2/2/06, James Holderness <j4_james@...> wrote:
> Btw, where do you get your stats from?

I cheat, and use browser statistics rather than feed statistics when
it suits me.

There's no way of guessing what percentage of eventual IE7 users will
use it for feed reading, but I doubt that they'll do a quick flip-flop
on the content model of titles: unlike some BigCo RSS products, IE7's
RSS stuff doesn't give the impression of being thrown together on
impossible deadline by people who don't understand anything about
feeds. I think it was deliberate, just like the deliberate decision to
only accept malformed XML when it's malformed by virtue of text/xml.
They've got plenty of people and bandwidth for testing against the
world, and plenty of awareness of what sort of content currently gets
thrown between <rss> and </rss>; I'm surprised that they're willing to
go to bat on principle for this, but I don't think they'll be
surprised by the complaints.

Phil Ringnalda

#35 From: Sam Ruby <rubys@...>
Date: Fri Feb 3, 2006 2:48 am
Subject: Re: HTML Markup in RSS Documents
sa3ruby
Send Email Send Email
 
James Holderness wrote:
>
> FWIW I've had a chance to test some of the online aggregators which adds
> another four to the side of "markup in a title is markup". It will be
> interesting to see how many of these have changed their interpretation a
> year from now.

I doubt the rule is as cut and dry as "markup in a title is markup".
Take a look at the conclusion Tim provides in the following:

http://www.tbray.org/ongoing/When/200x/2004/03/16/EscMad

What likely is happening under the covers is that various tools have
evolved different coping mechanisms.  After all, is <gorilla>
escaped markup?

It is amazing what can happen over time with clear specifications,
validators, test suites, and a few high profile aplications all pull
together.  Atom has shown that progress in this area is possible.

Furthermore, most people have avoided angle brackets in titles, so very
little data will have to change.  There may be some instances of strings
like "Lo&#239;c", but that too is likely to be relatively small.

- Sam Ruby

#36 From: "James Holderness" <j4_james@...>
Date: Fri Feb 3, 2006 3:28 am
Subject: Re: HTML Markup in RSS Documents
james_holder...
Send Email Send Email
 
Phil Ringnalda wrote:
> I cheat, and use browser statistics rather than feed statistics when
> it suits me.

Fair enough. I don't think feedburner's statistic are necessarily any
better. I was just hoping somebody might have published a new study or
something interesting.

> IE7's RSS stuff doesn't give the impression of being thrown together
> on impossible deadline by people who don't understand anything
> about feeds. I think it was deliberate

You could be right. It's worth noting that they also chose to use the feed
uri as the base uri for relative references. Technically the correct
decision (well mostly), but probably least likely to work. If that was
deliberate it does tend to suggest they're trying to make the right decision
rather than the most sensible decision.

> just like the deliberate decision to only accept malformed XML
> when it's malformed by virtue of text/xml.

Actually I wasn't too impressed with their XML support. Of all my test
feeds, the ones that gave them the most trouble (to the extent that they
flat out refused to parse them) were the ones testing features of XML.
Nothing too obscure either - this is stuff that IE6 could process quite
happily. My first impression was that they had probably written some kind of
regex XML parser to cope with broken feeds and hadn't done a full XML
implementation yet. I never got around to checking whether they could
actually handle malformed XML though.

Regards
James

#37 From: "James Holderness" <j4_james@...>
Date: Fri Feb 3, 2006 4:46 am
Subject: Re: HTML Markup in RSS Documents
james_holder...
Send Email Send Email
 
Sam Ruby wrote:
> I doubt the rule is as cut and dry as "markup in a title is markup".

True. I have test cases for a lot more than just simple markup in a title
and aggregators vary quite widely in their range of interpretation (for
example I've seen double escaping interepreted in probably five different
ways). For the most part, though, markup is still just treated as markup.

> What likely is happening under the covers is that various tools have
> evolved different coping mechanisms.  After all, is <gorilla>
> escaped markup?

In almost all of my tests, yes.

> It is amazing what can happen over time with clear specifications,
> validators, test suites, and a few high profile aplications all pull
> together.  Atom has shown that progress in this area is possible.

Now that it's no longer front page news, I think the Atom community has kind
of lost interest with this whole compliance idea. I still think it's worth
making the effort though. Does the RSS board intend to setup something
similar for RSS?

> Furthermore, most people have avoided angle brackets in titles, so very
> little data will have to change.  There may be some instances of strings
> like "Lo&#239;c", but that too is likely to be relatively small.

It's hard to say. I've certainly seen a number of fancy quotes encoded like
that. Feeds from radio.weblogs.com seem to be a good source (or at least
they used to be - it's possible that has changed). This is the sort of thing
that I wish someone like syndic8 would track.

Anyway, I think it's obvious I'm not going to persuade anyone that this is a
bad idea and in the long run it'll probably all sort itself out. In the
meantime though, I think I'll just stick with "markup in a title is markup"
until the trend shifts in the other direction.

Regards
James

#38 From: "rcade" <rcade@...>
Date: Fri Feb 3, 2006 3:12 pm
Subject: Aggregator Test: Displaying a Diaresis
rcade
Send Email Send Email
 
As Sam Ruby has pointed out, Loïc Le Meur's name is a good test of a
simple encoding issue faced by RSS publishers: When a word contains a
character with a diaresis or similar mark, how do you properly format
the word so that the character is presented correctly by aggregators?

I tested this with single escaping ("Loïc Le Meur") and double
escaping ("Lo&#239;c Le Meur") in Internet Explorer, Firefox,
Bloglines, My Yahoo and NewsGator online.

Test files

http://www.rssboard.org/files/test-single-escaped.xml
http://www.rssboard.org/files/test-double-escaped.xml

The Feed Validator reports both files as valid RSS, but the
double-escaped version has two warnings that channel title and item
title should not contain HTML.

I'm going to do more tests on other aggregators, and if anyone else
would like to do the same, that would be helpful. I'll invite some
aggregator developers to look this over as well.

Aggregator tests

1. Bloglines

INPUT: Loïc Le Meur (single escaped)

PRESENTATION:

Channel Title: Loïc Le Meur
Channel Description: A weblog about Loïc Le Meur
Item Title: Loïc Le Meur Joins RSS Board
Item Author: By loic@... (Lo[bad character]c Le Meur)
Item Description: Loïc Le Meur has joined the RSS Board
Item Category: on Loïc Le Meur

INPUT: Lo&#239;c Le Meur (double escaped)

PRESENTATION:

Channel Title: Loïc Le Meur
Channel Description: A weblog about Loïc Le Meur
Item Title: Loïc Le Meur Joins RSS Board
Item Author: By loic@... (Lo[bad character]c Le Meur)
Item Description: Loïc Le Meur has joined the RSS Board
Item Category: on Loïc Le Meur

CONCLUSION: Single- and double-escaping produce the same result. All
elements present his name correctly except for item author, which
contains a bad character under either escaping.

2. Internet Explorer

INPUT: Loïc Le Meur (single escaped)

OUTPUT:

Channel Title: Loïc Le Meur
Channel Description: A weblog about Loïc Le Meur
Item Title: Loïc Le Meur Joins RSS Board
Item Author: - loic@... (Loïc Le Meur)
Item Description: Loïc Le Meur has joined the RSS Board
Item Category: Loïc Le Meur

INPUT: Lo&#239;c Le Meur (double escaped)

OUTPUT:

Channel Title: Loïc Le Meur
Channel Description: A weblog about Loïc Le Meur
Item Title: Loïc Le Meur Joins RSS Board
Item Author: - loic@... (Loïc Le Meur)
Item Description: Loïc Le Meur has joined the RSS Board
Item Category: Loïc Le Meur

CONCLUSION: Single-escaping presents his name correctly on all
elements. Double-escaping doesn't, except for item description.

3. Firefox 1.0.5.1

INPUT: Loïc Le Meur (single escaped)

Channel Title: unsupported
Channel Description: unsupported
Item Title: Loïc Le Meur Joins RSS Board
Item Author: unsupported
Item Description: unsupported
Item Category: unsupported

INPUT: Lo&#239;c Le Meur (double escaped)

OUTPUT:

Channel Title: unsupported
Channel Description: unsupported
Item Title: Loïc Le Meur Joins RSS Board
Item Author: unsupported
Item Description: unsupported
Item Category: unsupported

CONCLUSION: Firefox only supports item titles. Single-escaping
presents his name correctly, double-escaping doesn't.

4. My Yahoo

INPUT: Loïc Le Meur (single escaped)

OUTPUT:

Channel Title: Lo[bad character]c Le Meur (on Add Content form), no
title at all (on My Yahoo home page after adding content)
Channel Description: unsupported
Item Title: Loïc Le Meur Joins RSS Board
Item Author: Unsupported
Item Description: Unsupported
Item Category: Unsupported

INPUT: Lo&#239;c Le Meur (double escaped)

OUTPUT:

Channel Title: Loïc Le Meur (on Add Content form), no title at
all (on My Yahoo home page after adding content)
Channel Description: unsupported
Item Title: Loïc Le Meur Joins RSS Board
Item Author: Unsupported
Item Description: Unsupported
Item Category: Unsupported

CONCLUSION: Single- and double-escaping works for item titles and
doesn't work for channel titles.

5. NewsGator Online

INPUT: Loïc Le Meur (single escaped)

OUTPUT:

Channel Title: Loïc Le Meur
Channel Description: unsupported
Item Title: Loïc Le Meur Joins RSS Board
Item Author: [loic@... (Loïc Le Meur)]
Item Description: Loïc Le Meur has joined the RSS Board
Item Category: Unsupported

INPUT: Lo&#239;c Le Meur (double escaped)

OUTPUT:

Channel Title: Loïc Le Meur (on My Feeds page), Loïc Le Meur (on
My Feeds sidebar), Loïc Le Meur (on feed display page)
Channel Description: unsupported
Item Title: Loïc Le Meur Joins RSS Board
Item Author: [loic@... (Loïc Le Meur)]
Item Description: Loïc Le Meur has joined the RSS Board
Item Category: Unsupported

CONCLUSION: Single- and double-escaping produce the same result with
one exception. All elements present his name correctly except for one
page where the channel title appears under double-escaping.

#39 From: "rcade" <rcade@...>
Date: Fri Feb 3, 2006 3:40 pm
Subject: Re: Aggregator Test: Displaying a Diaresis
rcade
Send Email Send Email
 
Ugh. Yahoo Groups accepts a diaresis character on preview but not in
the actual e-mail.

Here's an HTML version of my post:

http://www.rssboard.org/files/diaresis-test.html

#40 From: Sam Ruby <rubys@...>
Date: Fri Feb 3, 2006 3:45 pm
Subject: Re: Aggregator Test: Displaying a Diaresis
sa3ruby
Send Email Send Email
 
rcade wrote:
>
> The Feed Validator reports both files as valid RSS, but the
> double-escaped version has two warnings that channel title and item
> title should not contain HTML.

At the moment (and as your results confirm) best interoperability is
obtained when titles are single escaped.  If clarifications emerge from
this process, the Feed Validator will be updated accordingly.

> I'm going to do more tests on other aggregators, and if anyone else
> would like to do the same, that would be helpful. I'll invite some
> aggregator developers to look this over as well.

A great way to encourage participation and to share results is via a
Wiki.  I once set one up for you for your FDML initiative.  (FYI: those
pages are now hidden as they were long since taken over by spammers).

If you would like, I can quickly set you up a wiki (separate from
Atom's).  Alternately, rssboard.org could host a wiki.  There are
several out there that are easy to set up.

- Sam Ruby

#41 From: "rcade" <rcade@...>
Date: Fri Feb 3, 2006 3:52 pm
Subject: Re: Aggregator Test: Displaying a Diaresis
rcade
Send Email Send Email
 
--- In rss-public@yahoogroups.com, Sam Ruby <rubys@...> wrote:
> A great way to encourage participation and to share results is via a
> Wiki.  I once set one up for you for your FDML initiative.  (FYI: those
> pages are now hidden as they were long since taken over by spammers).
>
> If you would like, I can quickly set you up a wiki (separate from
> Atom's).  Alternately, rssboard.org could host a wiki.  There are
> several out there that are easy to set up.

That's a good idea. I'd like to set one up on rssboard.org, preferably
driven on Linux/Apache/MySQL/PHP like the rest of the site. Any
recommendations?

#42 From: Sam Ruby <rubys@...>
Date: Fri Feb 3, 2006 4:16 pm
Subject: Re: Re: Aggregator Test: Displaying a Diaresis
sa3ruby
Send Email Send Email
 
rcade wrote:
> --- In rss-public@yahoogroups.com, Sam Ruby <rubys@...> wrote:
>
>>A great way to encourage participation and to share results is via a
>>Wiki.  I once set one up for you for your FDML initiative.  (FYI: those
>>pages are now hidden as they were long since taken over by spammers).
>>
>>If you would like, I can quickly set you up a wiki (separate from
>>Atom's).  Alternately, rssboard.org could host a wiki.  There are
>>several out there that are easy to set up.
>
> That's a good idea. I'd like to set one up on rssboard.org, preferably
> driven on Linux/Apache/MySQL/PHP like the rest of the site. Any
> recommendations?

For your purposes, http://phpwiki.sourceforge.net/ may be best:

   "Installation of PhpWiki is as simple as untarring the source
   distribution. PhpWiki works right out of the box. You will want to
   make configuration changes later for better performance and
   permanence, or to run PhpWiki off a relational database like MySQL,
   mSQL or Postgresql."

I've not used it myself, but I know others that have.

- Sam Ruby

#43 From: "James Holderness" <j4_james@...>
Date: Fri Feb 3, 2006 8:52 pm
Subject: Re: Aggregator Test: Displaying a Diaresis
james_holder...
Send Email Send Email
 
rcade wrote:
>I tested this with single escaping ("Loïc Le Meur") and double
>escaping ("Lo&#239;c Le Meur") in Internet Explorer, Firefox,
>Bloglines, My Yahoo and NewsGator online.

I haven't had a chance to try your specific tests but I have done some
escaping tests (titles only) on various aggregators which might be of
interest to you. The most important thing that I established is that there
are a number of different ways in which to escape things (I have 20+ tests)
and different aggregators will fail on different sets of those tests.

For example Bloglines, which passed your double escaping test, failed on 8
of mine. Even Firefox, which you would expect to treat everything as plain
text, managed to break on 8 of the tests (a different set to Bloglines).
They all managed to pass the single escaping tests though, so if you're a
feed producer, that's probably what you should be going with.

Regards
James

#44 From: Sam Ruby <rubys@...>
Date: Sat Feb 4, 2006 2:19 pm
Subject: Re: Aggregator Test: Displaying a Diaresis
sa3ruby
Send Email Send Email
 
James Holderness wrote:
>
> They all managed to pass the single escaping tests though, so if you're a
> feed producer, that's probably what you should be going with.

Until you want to include a less-than character, and then nothing works
consistently.

Rogers has been pursing the original intent of the specs in a manner
that would make the most conservative jurist proud.  I hope he keeps it
it up.

Note: I would prefer if instead of continuing to update draft-1, he
created new drafts.

- Sam Ruby

Messages 15 - 44 of 2012   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help