Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

rss-public

The Yahoo! Groups Product Blog

Check it out!

Group Information

  • Members: 509
  • Category: XML
  • Founded: Jan 22, 2006
  • Language: English
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Messages

Advanced
Messages Help
data-types-url   Topic List   < Prev Topic  |  Next Topic >
Summarize Messages Sort by Date  
#445 From: Sam Ruby <rubys@...>
Date: Tue Feb 28, 2006 12:23 pm
Subject: data-types-url
sa3ruby
Send Email Send Email
 
To avoid the necessity of chasing around multiple specifications and
interpreting things based on what isn't said, a simple statement should
be added to the data-types-url section of the spec:

IRIs MUST be converted to URIs before being included in an RSS 2.0
document.

Perhaps with a hypertext link to
http://www.apps.ietf.org/rfc/rfc3987.html#sec-3.2

Background: the domain name in URIs have always been based on the
US-centric ASCII character set. Understandably, there has been a
growing demand for domain names which include characters which are
present in non-English languages. From RFC 3987:

The characters in URIs are frequently used for representing words of
natural languages. This usage has many advantages: Such URIs are
easier to memorize, easier to interpret, easier to transcribe, easier
to create, and easier to guess. For most languages other than
English, however, the natural script uses characters other than A -
Z. For many people, handling Latin characters is as difficult as
handling the characters of other scripts is for those who use only
the Latin alphabet. Many languages with non-Latin scripts are
transcribed with Latin letters. These transcriptions are now often
used in URIs, but they introduce additional ambiguities.

As an example, see:

http://www.atemschutzunfälle.de/asu.rdf

Despite the rdf extension, this actually is a valid RSS 0.93 feed.

Based on concerns of breaking existing software, the way this was
approached was in two phases. RFC 3743 specifies a backwards compatible
metchanism for Internationalizing Domain Names in Applications. It
involves encoding the non-ASCII characters in a special way. The domain
name above, which contains an umlaut, gets encoded thus:
www.xn--atemschutzunflle-7nb.de

The other way forward was captured in RFC 3987, and it allows such
characters to be included directly into IRIs. Quoting from that RFC:

a. A protocol or format element should be explicitly designated to
be able to carry IRIs. The intent is not to introduce IRIs into
contexts that are not defined to accept them. For example, XML
schema [XMLSchema] has an explicit type "anyURI" that includes
IRIs and IRI references. Therefore, IRIs and IRI references can
be in attributes and elements of type "anyURI". On the other
hand, in the HTTP protocol [RFC2616], the Request URI is defined
as a URI, which means that direct use of IRIs is not allowed in
HTTP requests.

Including IRIs as the url attribute of enclosure elements would quite
likely break existing software. As that was not the intent of IRIs, any
IRIs need to be mapped to an URI first.

Again, I don't think all this background needs to be included in the
spec, but a simple statement like the one suggested above would be
appropriate.

Test cases:

http://feedvalidator.org/testcases/rss20/data-types-url/

- Sam Ruby



#446 From: "Andy Henderson" <Andy@...>
Date: Tue Feb 28, 2006 3:16 pm
Subject: RE: data-types-url
andyatcita
Send Email Send Email
 
Sam Ruby wrote:
> Again, I don't think all this background needs to be included in the
> spec, but a simple statement like the one suggested above would be
> appropriate.

I would say the background is very important. The simple statement would
have meant nothing to me.

Andy




#447 From: "rcade" <rcade@...>
Date: Tue Feb 28, 2006 4:46 pm
Subject: Re: data-types-url
rcade
Send Email Send Email
 
--- In rss-public@yahoogroups.com, Sam Ruby <rubys@...> wrote:
> Again, I don't think all this background needs to be included in the
> spec, but a simple statement like the one suggested above would be
> appropriate.

Is there a workaround that RSS publishers in other languages are using
so that they may use IRIs as URLs in RSS, or are they simply forced to
employ URLs with the anglicized character set?






#448 From: Sam Ruby <rubys@...>
Date: Tue Feb 28, 2006 5:12 pm
Subject: Re: Re: data-types-url
sa3ruby
Send Email Send Email
 
rcade wrote:
> --- In rss-public@yahoogroups.com, Sam Ruby <rubys@...> wrote:
>
>>Again, I don't think all this background needs to be included in the
>>spec, but a simple statement like the one suggested above would be
>>appropriate.
>
> Is there a workaround that RSS publishers in other languages are using
> so that they may use IRIs as URLs in RSS, or are they simply forced to
> employ URLs with the anglicized character set?

Unless you are setting out to change what RSS 2.0 is, one need only look
at the baseline 2.0.1-rv-6 (a.k.a. "Harvard") spec for the enclosure
element, which says quite simply and clearly "The url must be an http
url". Given this, one could say that "they simply [are] forced to
employ URLs with the anglicized character set"(*).

While that SOUNDS bad, in practice it is not. There is a clear and
reversible (for all but some pesky edge cases of no consequence) mapping
from IRIs to URIs. And all this is handled transparently by some browsers.

Try entering either http://www.atemschutzunfälle.de/asu.rdf or
http://www.xn--atemschutzunflle-7nb.de/asu.rdf in the Feed Validator.
Either way, you will get the same results. In the validation results,
you will see the "human friendly" version in the input field. If you
look at the text link at the bottom of the page, you will see the
internal or "IDNA" version, one that is completely acceptable to all
HTTP stacks, and conforms to the RSS 2.0 specification.

- Sam Ruby

(*) Note that I am talking about the "host" portion of the URI here.
Non-ASCII characters may be percent encoded and included in other
portions of the URI, for example, inside a query string.



#450 From: "rcade" <rcade@...>
Date: Wed Mar 1, 2006 11:35 am
Subject: Re: data-types-url
rcade
Send Email Send Email
 
--- In rss-public@yahoogroups.com, Sam Ruby <rubys@...> wrote:
>IRIs MUST be converted to URIs before being included in an RSS 2.0
>document.

I think that the following sentence in data-types-urls serves the same
purpose without sounding like a new requirement for RSS implementers:

These elements MUST NOT contain IRIs.

The word "IRIs" could link to http://www.apps.ietf.org/rfc/rfc3987.html.

Implementers who are conversant with IRIs would know this means a
conversion to URLs is necessary in order to be compliant with Really
Simple Syndication.

This wouldn't be a change because 2.0.1-rv-6 requires URLs, and IRIs
are not URLs.






#451 From: Sam Ruby <rubys@...>
Date: Wed Mar 1, 2006 11:47 am
Subject: Re: Re: data-types-url
sa3ruby
Send Email Send Email
 
rcade wrote:
> --- In rss-public@yahoogroups.com, Sam Ruby <rubys@...> wrote:
>
>>IRIs MUST be converted to URIs before being included in an RSS 2.0
>>document.
>
> I think that the following sentence in data-types-urls serves the same
> purpose without sounding like a new requirement for RSS implementers:
>
> These elements MUST NOT contain IRIs.
>
> The word "IRIs" could link to http://www.apps.ietf.org/rfc/rfc3987.html.
>
> Implementers who are conversant with IRIs would know this means a
> conversion to URLs is necessary in order to be compliant with Really
> Simple Syndication.
>
> This wouldn't be a change because 2.0.1-rv-6 requires URLs, and IRIs
> are not URLs.

I am fine with that wording, Now lets look at how these two suggestions
can be compbined.

These elements MUST NOT contain IRIs. IRIs MUST be converted to
URIs before being included in an RSS 2.0 document.

The first sentence sounds like "note to non-English people: you are
screwed". The second sentence says "no you are not, here's a path
forward, complete with a helpful link to section 3.2 of RFC 3987 which
tells you what you need to do".

But however you chose to word it is fine with me.

- Sam Ruby




#453 From: Sam Ruby <rubys@...>
Date: Wed Mar 1, 2006 12:11 pm
Subject: Re: Re: data-types-url
sa3ruby
Send Email Send Email
 
Sam Ruby wrote:
> rcade wrote:
>
>>--- In rss-public@yahoogroups.com, Sam Ruby <rubys@...> wrote:
>>
>>>IRIs MUST be converted to URIs before being included in an RSS 2.0
>>>document.
>>
>>I think that the following sentence in data-types-urls serves the same
>>purpose without sounding like a new requirement for RSS implementers:
>>
>>These elements MUST NOT contain IRIs.
>
> I am fine with that wording,

Upon further reflection, that sentence is misleading.

The set of valid IRIs is a proper set supersets of the set of valid
URIs. So disallowing IRIs would disallow URIs.

The process defined for convering an IRI which is already a URI to a URI
is a no-op.

- Sam Ruby



#454 From: "A. Pagaltzis" <pagaltzis@...>
Date: Wed Mar 1, 2006 12:26 pm
Subject: Re: Re: data-types-url
a22pag
Send Email Send Email
 
* Sam Ruby <rubys@...> [2006-03-01 13:15]:
>The set of valid IRIs is a proper set supersets of the set of
>valid URIs. So disallowing IRIs would disallow URIs.

I think the correct wording for the spec would be that IRIs with
non-ASCII characters MUST be given in their punycode-encoded URI
representation.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>



 
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help