Search the web
Sign In
New User? Sign Up
caplet · The Caplet Group
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want to share photos of your group with the world? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Am I paranoid enough?   Message List  
Reply | Forward Message #288 of 310 |
Re: [caplet] Am I paranoid enough?

2009/2/17 David-Sarah Hopwood <david.hopwood@...>:
> Mike Samuel wrote:
>> 2009/2/16 David-Sarah Hopwood <david.hopwood@...>
>>> Suppose that S is a Unicode string in which each character matches
>>> ValidChar below, not containing the subsequences "<!", "</" or "]]>", and
>>> not containing ("&" followed by a character not matching AmpFollower).
>>> S encodes a syntactically correct ES3 or ES3.1 source text chosen by
>>> an attacker.
>>>
>>> ValidChar :: one of
>>> '\u0009' '\u000A' '\u000D' // TAB, LF, CR
>>> [\u0020-\u007E]
>>> [\u00A0-\u00AC]
>>> [\u00AE-\u05FF]
>>> [\u0604-\u06DC]
>>> [\u06DE-\u070E]
>>> [\u0710-\u17B3]
>>> [\u17B6-\u200A]
>>> [\u2010-\u2027]
>>> [\u202F-\u205F]
>>> [\u2070-\uD7FF]
>>
>> So no surrogates?
>
> Correct. They're not characters (or even "noncharacters").
>
>>> [\uE000-\uFDCF]
>>> [\uFDF0-\uFEFE]
>>> [\uFF00-\uFFEF]
>>
>> Why include FFEF?
>
> It's unassigned, and there's no particular reason to exclude it.
> (\uFFF0-\uFFF8 are also unassigned, but that's an area reserved
> for "special" characters.)

Isn't it the reflection of fffe, the byte-order-marker.
This is probably a very minor issue, but if one part of a parser
naively delegates to another parser that mistakenly treats its content
as a byte string instead of code units, the presence of a BOM might
cause the delegatee to misinterpret content when something that looks
like a BOM appears at the beginning of a chunk of embedded language.


>>> AmpFollower :: one of
>>> '=' '(' '+' '-' '!' '~' '"' '/' [0-9]
>>> '\u0027' '\u005C' '\u0020' '\u0009' '\u000A' \u000D'
>>> // single quote, backslash, space, TAB, LF, CR
>>>
>>> (ValidChar excludes format control characters, and some other
>>> characters known to be mishandled by browsers. AmpFollower is
>>> intended to exclude characters that can start an entity reference.)
>>>
>>> S is inserted between "<script>" and "</script>" in a place where a
>>> <script> tag is allowed in an otherwise valid HTML document, or
>>> between "<script><![CDATA[" and "]]></script>" in a place where a
>>> <script> tag is allowed in an otherwise valid XHTML document.
>>> The HTML or XHTML document starts with a correct <!DOCTYPE or
>>> <?xml declaration respectively, and is encoded as well-formed
>>> UTF-8.
>>>
>>> Are these restrictions sufficient to ensure that the embedded
>>> script is interpreted as it would have been if referenced from
>>> an external file, foiling any attempts of browsers to collude
>>> with the attacker in misparsing it?
>>
>> You may still be subject to encoding attacks. I'm sure there are
>> valid scripts that look like UTF-7, so if the script appears in the
>> first 1024B, you might need to make sure it's preceded by a <meta>
>> element specifying an encoding, and/or use the XML prologue form that
>> specifies an encoding.
>
> Right; I covered that in a follow-up. Is including a UTF-8 BOM at the
> start sufficient for all browsers (that is, are there any browsers
> in which a <meta> tag or other content sniffing can override an
> explicit initial UTF-8 BOM, in either HTML or XHTML)?

Ah cool. I don't know the answer to that question.


> HTML5 section 8.2.2.1 seems to indicate that "if the transport layer
> specifies an encoding" (i.e. presumably the charset specified in
> a Content-Type header), then that should override a BOM. That's
> irritating, because it means that you have to assume that the server
> gets the Content-Type right, *as well as* including a BOM for the
> browsers in which Content-Type doesn't override sniffing
> (Internet Explorer, at least), and for the case where the document
> is read from a local file.

Yeah. I think the best thing to do is to use a fairly standard
encoding like UTF-8, and make sure the XML prologue, <meta
http-equiv="content-type">, and headers all agree.

I don't think that you can do much about file hosting services that go
out of their way to specify a whacky encoding. Putting a BOM at the
front will help hosting services that make a genuine effort.


> --
> David-Sarah Hopwood ⚥
>
>



Tue Feb 17, 2009 6:50 pm

mikesamuel
Offline Offline
Send Email Send Email

Forward
Message #288 of 310 |
Expand Messages Author Sort by Date

Suppose that S is a Unicode string in which each character matches ValidChar below, not containing the subsequences "<!", "</" or "]]>", and not containing...
David-Sarah Hopwood
david.hopwood@...
Send Email
Feb 16, 2009
3:16 pm

No, I'm not paranoid enough yet. It's not sufficient only to say that the HTML is encoded as UTF-8 (see below). David-Sarah Hopwood wrote: [...] ... I meant,...
David-Sarah Hopwood
david.hopwood@...
Send Email
Feb 16, 2009
4:29 pm

2009/2/16 David-Sarah Hopwood <david.hopwood@...> ... So no surrogates? ... Why include FFEF? ... You may still be subject to encoding...
Mike Samuel
mikesamuel
Offline Send Email
Feb 16, 2009
11:38 pm

... Correct. They're not characters (or even "noncharacters"). ... It's unassigned, and there's no particular reason to exclude it. (\uFFF0-\uFFF8 are also...
David-Sarah Hopwood
david.hopwood@...
Send Email
Feb 17, 2009
11:13 am

... Isn't it the reflection of fffe, the byte-order-marker. This is probably a very minor issue, but if one part of a parser naively delegates to another...
Mike Samuel
mikesamuel
Offline Send Email
Feb 17, 2009
6:50 pm

... [...] ... No, \uFEFF is the BOM, and its byte-reflection \uFFFE is a noncharacter, so already excluded from ValidChar. (Thought you'd spotted something I'd...
David-Sarah Hopwood
david.hopwood@...
Send Email
Feb 18, 2009
5:26 pm

... Ah, quite right....
Mike Samuel
mikesamuel
Offline Send Email
Feb 18, 2009
9:54 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help