Search the web
Sign In
New User? Sign Up
caplet · The Caplet Group
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Am I paranoid enough?   Message List  
Reply | Forward Message #287 of 310 |
Re: [caplet] Am I paranoid enough?

Mike Samuel wrote:
> 2009/2/16 David-Sarah Hopwood <david.hopwood@...>
>> Suppose that S is a Unicode string in which each character matches
>> ValidChar below, not containing the subsequences "<!", "</" or "]]>", and
>> not containing ("&" followed by a character not matching AmpFollower).
>> S encodes a syntactically correct ES3 or ES3.1 source text chosen by
>> an attacker.
>>
>> ValidChar :: one of
>> '\u0009' '\u000A' '\u000D' // TAB, LF, CR
>> [\u0020-\u007E]
>> [\u00A0-\u00AC]
>> [\u00AE-\u05FF]
>> [\u0604-\u06DC]
>> [\u06DE-\u070E]
>> [\u0710-\u17B3]
>> [\u17B6-\u200A]
>> [\u2010-\u2027]
>> [\u202F-\u205F]
>> [\u2070-\uD7FF]
>
> So no surrogates?

Correct. They're not characters (or even "noncharacters").

>> [\uE000-\uFDCF]
>> [\uFDF0-\uFEFE]
>> [\uFF00-\uFFEF]
>
> Why include FFEF?

It's unassigned, and there's no particular reason to exclude it.
(\uFFF0-\uFFF8 are also unassigned, but that's an area reserved
for "special" characters.)

>> AmpFollower :: one of
>> '=' '(' '+' '-' '!' '~' '"' '/' [0-9]
>> '\u0027' '\u005C' '\u0020' '\u0009' '\u000A' \u000D'
>> // single quote, backslash, space, TAB, LF, CR
>>
>> (ValidChar excludes format control characters, and some other
>> characters known to be mishandled by browsers. AmpFollower is
>> intended to exclude characters that can start an entity reference.)
>>
>> S is inserted between "<script>" and "</script>" in a place where a
>> <script> tag is allowed in an otherwise valid HTML document, or
>> between "<script><![CDATA[" and "]]></script>" in a place where a
>> <script> tag is allowed in an otherwise valid XHTML document.
>> The HTML or XHTML document starts with a correct <!DOCTYPE or
>> <?xml declaration respectively, and is encoded as well-formed
>> UTF-8.
>>
>> Are these restrictions sufficient to ensure that the embedded
>> script is interpreted as it would have been if referenced from
>> an external file, foiling any attempts of browsers to collude
>> with the attacker in misparsing it?
>
> You may still be subject to encoding attacks. I'm sure there are
> valid scripts that look like UTF-7, so if the script appears in the
> first 1024B, you might need to make sure it's preceded by a <meta>
> element specifying an encoding, and/or use the XML prologue form that
> specifies an encoding.

Right; I covered that in a follow-up. Is including a UTF-8 BOM at the
start sufficient for all browsers (that is, are there any browsers
in which a <meta> tag or other content sniffing can override an
explicit initial UTF-8 BOM, in either HTML or XHTML)?

HTML5 section 8.2.2.1 seems to indicate that "if the transport layer
specifies an encoding" (i.e. presumably the charset specified in
a Content-Type header), then that should override a BOM. That's
irritating, because it means that you have to assume that the server
gets the Content-Type right, *as well as* including a BOM for the
browsers in which Content-Type doesn't override sniffing
(Internet Explorer, at least), and for the case where the document
is read from a local file.

--
David-Sarah Hopwood ⚥




Tue Feb 17, 2009 11:13 am

david.hopwood@...
Send Email Send Email

Forward
Message #287 of 310 |
Expand Messages Author Sort by Date

Suppose that S is a Unicode string in which each character matches ValidChar below, not containing the subsequences "<!", "</" or "]]>", and not containing...
David-Sarah Hopwood
david.hopwood@...
Send Email
Feb 16, 2009
3:16 pm

No, I'm not paranoid enough yet. It's not sufficient only to say that the HTML is encoded as UTF-8 (see below). David-Sarah Hopwood wrote: [...] ... I meant,...
David-Sarah Hopwood
david.hopwood@...
Send Email
Feb 16, 2009
4:29 pm

2009/2/16 David-Sarah Hopwood <david.hopwood@...> ... So no surrogates? ... Why include FFEF? ... You may still be subject to encoding...
Mike Samuel
mikesamuel
Offline Send Email
Feb 16, 2009
11:38 pm

... Correct. They're not characters (or even "noncharacters"). ... It's unassigned, and there's no particular reason to exclude it. (\uFFF0-\uFFF8 are also...
David-Sarah Hopwood
david.hopwood@...
Send Email
Feb 17, 2009
11:13 am

... Isn't it the reflection of fffe, the byte-order-marker. This is probably a very minor issue, but if one part of a parser naively delegates to another...
Mike Samuel
mikesamuel
Offline Send Email
Feb 17, 2009
6:50 pm

... [...] ... No, \uFEFF is the BOM, and its byte-reflection \uFFFE is a noncharacter, so already excluded from ValidChar. (Thought you'd spotted something I'd...
David-Sarah Hopwood
david.hopwood@...
Send Email
Feb 18, 2009
5:26 pm

... Ah, quite right....
Mike Samuel
mikesamuel
Offline Send Email
Feb 18, 2009
9:54 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help