Search the web
Sign In
New User? Sign Up
rss-public
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
data-types-url   Message List  
Reply | Forward Message #445 of 1975 |
To avoid the necessity of chasing around multiple specifications and
interpreting things based on what isn't said, a simple statement should
be added to the data-types-url section of the spec:

IRIs MUST be converted to URIs before being included in an RSS 2.0
document.

Perhaps with a hypertext link to
http://www.apps.ietf.org/rfc/rfc3987.html#sec-3.2

Background: the domain name in URIs have always been based on the
US-centric ASCII character set. Understandably, there has been a
growing demand for domain names which include characters which are
present in non-English languages. From RFC 3987:

The characters in URIs are frequently used for representing words of
natural languages. This usage has many advantages: Such URIs are
easier to memorize, easier to interpret, easier to transcribe, easier
to create, and easier to guess. For most languages other than
English, however, the natural script uses characters other than A -
Z. For many people, handling Latin characters is as difficult as
handling the characters of other scripts is for those who use only
the Latin alphabet. Many languages with non-Latin scripts are
transcribed with Latin letters. These transcriptions are now often
used in URIs, but they introduce additional ambiguities.

As an example, see:

http://www.atemschutzunfälle.de/asu.rdf

Despite the rdf extension, this actually is a valid RSS 0.93 feed.

Based on concerns of breaking existing software, the way this was
approached was in two phases. RFC 3743 specifies a backwards compatible
metchanism for Internationalizing Domain Names in Applications. It
involves encoding the non-ASCII characters in a special way. The domain
name above, which contains an umlaut, gets encoded thus:
www.xn--atemschutzunflle-7nb.de

The other way forward was captured in RFC 3987, and it allows such
characters to be included directly into IRIs. Quoting from that RFC:

a. A protocol or format element should be explicitly designated to
be able to carry IRIs. The intent is not to introduce IRIs into
contexts that are not defined to accept them. For example, XML
schema [XMLSchema] has an explicit type "anyURI" that includes
IRIs and IRI references. Therefore, IRIs and IRI references can
be in attributes and elements of type "anyURI". On the other
hand, in the HTTP protocol [RFC2616], the Request URI is defined
as a URI, which means that direct use of IRIs is not allowed in
HTTP requests.

Including IRIs as the url attribute of enclosure elements would quite
likely break existing software. As that was not the intent of IRIs, any
IRIs need to be mapped to an URI first.

Again, I don't think all this background needs to be included in the
spec, but a simple statement like the one suggested above would be
appropriate.

Test cases:

http://feedvalidator.org/testcases/rss20/data-types-url/

- Sam Ruby



Tue Feb 28, 2006 12:23 pm

sa3ruby
Offline Offline
Send Email Send Email

Forward
Message #445 of 1975 |
Expand Messages Author Sort by Date

To avoid the necessity of chasing around multiple specifications and interpreting things based on what isn't said, a simple statement should be added to the...
Sam Ruby
sa3ruby
Offline Send Email
Feb 28, 2006
12:24 pm

... I would say the background is very important. The simple statement would have meant nothing to me. Andy...
Andy Henderson
andyatcita
Offline Send Email
Feb 28, 2006
3:24 pm

... Is there a workaround that RSS publishers in other languages are using so that they may use IRIs as URLs in RSS, or are they simply forced to employ URLs...
rcade
Offline Send Email
Feb 28, 2006
4:47 pm

... Unless you are setting out to change what RSS 2.0 is, one need only look at the baseline 2.0.1-rv-6 (a.k.a. "Harvard") spec for the enclosure element,...
Sam Ruby
sa3ruby
Offline Send Email
Feb 28, 2006
5:14 pm

... I think that the following sentence in data-types-urls serves the same purpose without sounding like a new requirement for RSS implementers: These elements...
rcade
Offline Send Email
Mar 1, 2006
11:35 am

... I am fine with that wording, Now lets look at how these two suggestions can be compbined. These elements MUST NOT contain IRIs. IRIs MUST be converted to...
Sam Ruby
sa3ruby
Offline Send Email
Mar 1, 2006
11:49 am

... Upon further reflection, that sentence is misleading. The set of valid IRIs is a proper set supersets of the set of valid URIs. So disallowing IRIs would...
Sam Ruby
sa3ruby
Offline Send Email
Mar 1, 2006
12:13 pm

... I think the correct wording for the spec would be that IRIs with non-ASCII characters MUST be given in their punycode-encoded URI representation. Regards, ...
A. Pagaltzis
a22pag
Offline Send Email
Mar 1, 2006
12:27 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help