What's the recommended idiom for iterating over the elements of an
array? I had been using:
for (var i = 0; i !== v.length; i += 1) {
var element = v[+i];
...
}
Are you expecting array iteration to be done as:
for (var i = 0; i !== v.length; i += 1) {
var element = ADSAFE.get(v, i);
...
}
This is a little cumbersome.
I'm looking for something that will work reliably across browsers,
including IE6.
Could JSLint be made smart enough to realize that the variable "i" in
the above loop is always a positive number?
--Tyler
On Fri, Jul 31, 2009 at 9:20 AM, Douglas Crockford<douglas@...> wrote:
> The ADsafe verifier now rejects programs that use the arguments pseudo array.
>
> The ADsafe verifier now rejects programs that use expressions with the
subscript operator, even when the + prefix is used. The ADSAFE.get and
ADSAFE.set methods must be used instead. The subscript operator may be used with
positive number literals and string literals that do not begin with - or _.
>
>
>
> ------------------------------------
>
> Yahoo! Groups Links
>
>
>
>
--
"Waterken News: Capability security on the Web"
http://waterken.sourceforge.net/recent.html
I added a new query pattern:
:trim
This produces a bunch from which all
text nodes containing only whitespace
are removed
I added these bunch methods:
.each(func)
The function is called for each node in
the bunch.
.title(value)
Set the title attribute of each node.
.getTitle()
Get the title attribute of each node.
I changed the way ADSAFE._intercept(func) works.
It is now called as a method, passing a function
that will be called when a new widget is started.
http://www.JSLint.com/ is now an ADsafe widget.
The ADsafe verifier now rejects programs that use the arguments pseudo array.
The ADsafe verifier now rejects programs that use expressions with the subscript
operator, even when the + prefix is used. The ADSAFE.get and ADSAFE.set methods
must be used instead. The subscript operator may be used with positive number
literals and string literals that do not begin with - or _.
I repaired some leakage in the ADsafe Ajax library. Grateful thanks to John
Mitchell, Sergio Maffeis, and Ankur Taly. http://www.doc.ic.ac.uk/~maffeis/
I also changed the restrictions on ADSAFE.get and ADSAFE.put. They now reject
negative numbers and strings starting with '-'.
No; arguments is rewritten in cajita to a___ and in valija to
Array.slice(arguments,1).
On Wed, Jul 29, 2009 at 5:46 PM, David-Sarah
Hopwood<david-sarah@...> wrote:
>
> <http://webreflection.blogspot.com/2009/06/javascript-arguments-weridness.html>
> [sic] notes the following strange mifeaturosity of SpiderMonkey, still
> present in Firefox 3.5.1:
>
> Â function args() {
> Â Â alert(arguments[-3] === arguments.callee);
> Â Â alert(arguments[-2] === arguments.length);
> Â };
>
> The potential security weakness here is that if a function delegates
> 'arguments' to a callee, it will inadvertently grant access to itself
> via arguments[-3].
>
> Jacaranda narrowly dodged being vulnerable to this weakness because
> 'arguments' is not a first-class expression, and can only be delegated
> by saying 'ConstArray(arguments)', which filters out all but
> nonnegative-indexed properties. Is the current implementation of
> Cajita or of any of the other subsets vulnerable?
>
> --
> David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com
>
>
Hey I wanted to let you guys know that for now I'm discontinuing research on
FBJS2. Basically at this time instead we're focusing on Facebook Connect
(external sites, not embedded on Facebook) and the overall stability of Facebook
Platform (that is, not introducing wild new technologies that will break
everything). Meanwhile I'm moving on to some other projects at Facebook. It's
probably for the best since I never did get to commit as much time to FBJS2 as I
had wanted to; it was always kind of a rogue project I was working on in my
spare time.
The good news is that we're open sourcing everything we made along the way
[Apache license]...
http://developers.facebook.com/fbopen/fbjs2-0.1.tar.gz
I know some of you were interested in seeing our rewrite rules, so they're there
for everyone. Also included is a hastily-written JS runtime, which is nothing
special and also very slow. The parser \ rewrite engine is probably the most
reusable software in the package. We ended up using it internally for a few
different projects that need to wrangle Javascript.
--- In caplet@yahoogroups.com, Adam Barth <hk9565@...> wrote:
> Joel was playing around with ADsafe today and noticed that the
> verifier seems to be broken at the moment. For example, this widget
> passes the verifier but seems to cause problems:
>
> http://webblaze.org/jww/adsafe-broken.html
>
> Are we using the site wrong?
The fault was mine. Please ask Joel to try it again.
Hi folks,
Joel was playing around with ADsafe today and noticed that the
verifier seems to be broken at the moment. For example, this widget
passes the verifier but seems to cause problems:
http://webblaze.org/jww/adsafe-broken.html
Are we using the site wrong?
Thanks,
Adam
On Mon, May 25, 2009 at 3:37 PM, Brendan Eich <brendan@...> wrote:
> On May 25, 2009, at 2:56 PM, Tyler Close wrote:
>> On Sun, May 24, 2009 at 7:49 AM, Douglas Crockford
>> > I am considering the blocking of try/catch in ADsafe. I am
>> concerned about the
>> > potential of using exceptions to deliver capabilities between
>> isolated widgets.
>>
>> Javascript's catch is also problematic since it enables catching of
>> stack overflow and out of memory errors.
>>
>
> Out of memory is not catchable in SpiderMonkey.
What about stack overflow?
> What browsers did you test?
I did the testing during the caja security review and I believe I got
an exploit working in both IE 6 and Firefox 2 on Windows using the
stack overflow Error. The stack overflow Error was easier to work with
than the out of memory Error, since it's more predictable.
>> A widget could use this
>> ability to put another object, or perhaps even the browser, in an
>> inconsistent state. For example, the widget could use up all but one
>> stack frame and then make a call to a browser object which mutates
>> part of its state and then attempts a function call before making
>> additional mutations. The victim object would make the first mutation,
>> but suffer a stack overflow error before being able to complete the
>> rest of the mutations. The widget code could catch the Error, leaving
>> the victim object in the inconsistent state.
>>
>
> Sounds like a bug in the victim object. Why didn't it catch and clean
> up?
Because it wasn't expecting the function to throw. For example, the
called function may have been one implemented in the same block of
code as the calling function. Seeing that the called function didn't
throw, the programmer of the victim object didn't write a try catch
block. In general, protecting yourself against this kind of attack by
defensively writing try catch blocks is too cumbersome and error prone
to be feasible.
> Really, there are lots of potential bugs where an inconsistent state
> could result from errors. Making the errors fatal to the currently
> exeucting script only increases consistency in that particular script
> or event handler's control flow. The next script or event can still
> the inconsistency.
Sounds like an excellent argument for making errors fail-stop. ADsafe
is in a good position to implement this, since it can wrap all event
handlers created by a widget.
> If you poison the whole well, meaning both fail-stop the script and
> make the entire reachable object graph inaccessible or error-tainted,
> then you can limit the leak. But there's still a termination channel.
Allowing the widget to explicitly terminate the page is fine, since it
implicitly has this authority anyways, since it could always just
enter an infinite loop, or use other DOS techniques.
--Tyler
--
"Waterken News: Capability security on the Web"
http://waterken.sourceforge.net/recent.html
On May 25, 2009, at 2:56 PM, Tyler Close wrote:
> On Sun, May 24, 2009 at 7:49 AM, Douglas Crockford
> <douglas@...> wrote:
> >> So, I suggest that you consider adding 'stack', and possibly
> >> 'message', 'stacktrace' and 'toSource', to the banned list.
> >
> > I do not understand the value in preventing information leaks here.
> > What is the hazard?
>
I'd like to know too -- you can throw an object that you could return,
so that's not it.
Is it the ES3 spec bug, not implemented by many browsers, where the
scope of the catch variable is a new Object (and so can be attacked by
Object.prototype setters or throwing a function that's called to
capture |this|)? What browsers still do that?
> > I am considering the blocking of try/catch in ADsafe. I am
> concerned about the
> > potential of using exceptions to deliver capabilities between
> isolated widgets.
>
> Javascript's catch is also problematic since it enables catching of
> stack overflow and out of memory errors.
>
Out of memory is not catchable in SpiderMonkey.
What browsers did you test?
> A widget could use this
> ability to put another object, or perhaps even the browser, in an
> inconsistent state. For example, the widget could use up all but one
> stack frame and then make a call to a browser object which mutates
> part of its state and then attempts a function call before making
> additional mutations. The victim object would make the first mutation,
> but suffer a stack overflow error before being able to complete the
> rest of the mutations. The widget code could catch the Error, leaving
> the victim object in the inconsistent state.
>
Sounds like a bug in the victim object. Why didn't it catch and clean
up?
Really, there are lots of potential bugs where an inconsistent state
could result from errors. Making the errors fatal to the currently
exeucting script only increases consistency in that particular script
or event handler's control flow. The next script or event can still
the inconsistency.
If you poison the whole well, meaning both fail-stop the script and
make the entire reachable object graph inaccessible or error-tainted,
then you can limit the leak. But there's still a termination channel.
/be
On Sun, May 24, 2009 at 7:49 AM, Douglas Crockford
<douglas@...> wrote:
>> So, I suggest that you consider adding 'stack', and possibly
>> 'message', 'stacktrace' and 'toSource', to the banned list.
>
> I do not understand the value in preventing information leaks here.
> What is the hazard?
>
> I am considering the blocking of try/catch in ADsafe. I am concerned about the
> potential of using exceptions to deliver capabilities between isolated
widgets.
Javascript's catch is also problematic since it enables catching of
stack overflow and out of memory errors. A widget could use this
ability to put another object, or perhaps even the browser, in an
inconsistent state. For example, the widget could use up all but one
stack frame and then make a call to a browser object which mutates
part of its state and then attempts a function call before making
additional mutations. The victim object would make the first mutation,
but suffer a stack overflow error before being able to complete the
rest of the mutations. The widget code could catch the Error, leaving
the victim object in the inconsistent state.
Note that the widget doesn't need to guess the size of the stack, but
can measure it at runtime before engaging in the attack.
--Tyler
--
"Waterken News: Capability security on the Web"
http://waterken.sourceforge.net/recent.html
> So, I suggest that you consider adding 'stack', and possibly
> 'message', 'stacktrace' and 'toSource', to the banned list.
I do not understand the value in preventing information leaks here.
What is the hazard?
I am considering the blocking of try/catch in ADsafe. I am concerned about the
potential of using exceptions to deliver capabilities between isolated widgets.
I slimmed down the ADsafe banned list. These are the names of members that may
not be accessed. This list is now:
arguments callee caller constructor eval
prototype unwatch valueOf watch
I dropped apply and call from the list. These methods are potentially dangerous
because a missing thisArg is replaced with the global object. But since ADsafe
does not allow any use of this, it does not matter what this gets bound to.
I am dropping the ADSAFE.invoke function. It is not longer needed since apply is
available.
Web 2.0 Security & Privacy 2009
Claremont Resort in Oakland, California
May 21, 2009
http://w2spconf.com/2009/
The goal of this one day workshop is
to bring together researchers and practitioners from academia and industry
to focus on understanding Web 2.0 security and privacy issues, and establishing
new collaborations in these areas. This workshop is the 3rd in a series
of successful workshops on this topic.
Registration is now open. See
the main conference web site for registration information: http://oakland09.cs.virginia.edu/
. (You may register and participate in the workshop even if you are
not attending the 30th IEEE Symposium on Security & Privacy.)
If you would, please pass this information
on to your colleagues who may be interested in this workshop.
This workshop may be of interest to
subscribers of this mailing list
Web 2.0 Security & Privacy 2009
Claremont Resort in Oakland, California
May 21, 2009
http://w2spconf.com/2009/
The goal of this one day workshop is
to bring together researchers and practitioners from academia and industry
to focus on understanding Web 2.0 security and privacy issues, and establishing
new collaborations in these areas. This workshop is the 3rd in a series
of successful workshops on this topic.
Registration is now open. See
the main conference web site for registration information: http://oakland09.cs.virginia.edu/
. (You may register and participate in the workshop even if you are
not attending the 30th IEEE Symposium on Security & Privacy.)
If you would, please pass this information
on to your colleagues who may be interested in this workshop.
Do you know whether you will have time
in the next few days (before March 25) to review a few of the papers submitted
to W2SP this year? There are a few papers where we could use some
help with the reviewing
I added +tagName to the ADsafe query language. It selects the immediate sibling,
so dom.q("h1+p") selects all of the <p> that immediately follow an <h1>.
The goal of this one day workshop is to bring together
researchers and practitioners from academia and industry to focus on understanding
Web 2.0 security and privacy issues, and establishing new collaborations
in these areas.
Web 2.0 is about connecting people and amplifying the power
of working together. Enabled by a wave of new technology, these social
and business interactions rely on composition of content and services from
multiple sources, commonly called mash-ups, leading to systems with complex
trust boundaries. This trend is likely to continue because individuals
and businesses desire the efficiency and simplicity these technologies
offer.
Together with their virtues, these technologies raise issues
about management of identities, reputation, privacy, anonymity, transient
and long term relationships, and composition of function and content, both
on the server and on the client (web browser). Although the underlying
security and privacy issues are not new, the use of these technologies
on a wide scale and by a broad audience raises new questions. This workshop
is intended to discuss the limitations of current technologies and explore
alternatives.
The scope of W2SP 2009 includes, but is not limited to:
Trustworthy cloud-based services
Privacy and reputation in social networks
Usable security and privacy
Security for the mobile web
Identity management and psuedonymity
Advertisement and affiliate fraud
Provenance and governance
Security and privacy as a service
Web services/feeds/mashups
Security and privacy policies for composible content
Next-generation browser technology
Potential
workshop participants should submit a paper on topics relevant to Web 2.0
security and privacy issues. We are seeking both short position papers
(2–4 pages) and refereed papers (a maximum of 8 pages). Papers longer
than 8 pages may be automatically rejected by the chair or workshop committee.
From the submissions, the program committee will strive to balance participation
between academia and industry and across topics. Selected papers will appear
on the workshop web site.
Workshop Co-Chairs
Larry Koved (IBM Research)
Dan S. Wallach (Rice University)
Program
Chair
Adam Barth (UC Berkeley)
Program
Committee
Ben Adida (Harvard University)
Dirk Balfanz (PARC)
Adam Barth (UC Berkeley)
Konstantin (Kosta) Beznosov
Suresh Chari (IBM Research)
Hao Chen (UC Davis)
Douglas Crockford (Yahoo)
Chris Karlof (UC Berkeley)
Larry Koved (IBM Research)
Shriram Krishnamurthi (Brown University)
Collin Jackson (Stanford University)
Rob Johnson (Stony Brook University)
John C. Mitchell (Stanford University)
Sean W. Smith (Dartmouth University)
Helen Wang (Microsoft Research)
Dan S. Wallach (Rice University)
Important
Dates
Paper submission deadline: March 6, 2009, (11:59pm US-Eastern)
Workshop acceptance notification date: March 31, 2009
Workshop date: Thursday, May 21, 2009
Workshop paper submission web site: http://w2spconf.com/2009/
2009/2/18 David-Sarah Hopwood <david.hopwood@...>:
> Mike Samuel wrote:
>> 2009/2/17 David-Sarah Hopwood <david.hopwood@...>:
>>> Mike Samuel wrote:
>>>> 2009/2/16 David-Sarah Hopwood <david.hopwood@...>
>>>>> ValidChar :: one of
> [...]
>>>>> [\uFF00-\uFFEF]
>>>> Why include FFEF?
>>> It's unassigned, and there's no particular reason to exclude it.
>>> (\uFFF0-\uFFF8 are also unassigned, but that's an area reserved
>>> for "special" characters.)
>>
>> Isn't it the reflection of fffe, the byte-order-marker.
>
> No, \uFEFF is the BOM, and its byte-reflection \uFFFE is a noncharacter,
> so already excluded from ValidChar.
Ah, quite right.
> (Thought you'd spotted something I'd missed for a second, there.)
>
> --
> David-Sarah Hopwood ⚥
Mike Samuel wrote:
> 2009/2/17 David-Sarah Hopwood <david.hopwood@...>:
>> Mike Samuel wrote:
>>> 2009/2/16 David-Sarah Hopwood <david.hopwood@...>
>>>> ValidChar :: one of
[...]
>>>> [\uFF00-\uFFEF]
>>> Why include FFEF?
>> It's unassigned, and there's no particular reason to exclude it.
>> (\uFFF0-\uFFF8 are also unassigned, but that's an area reserved
>> for "special" characters.)
>
> Isn't it the reflection of fffe, the byte-order-marker.
No, \uFEFF is the BOM, and its byte-reflection \uFFFE is a noncharacter,
so already excluded from ValidChar.
(Thought you'd spotted something I'd missed for a second, there.)
--
David-Sarah Hopwood ⚥
2009/2/17 David-Sarah Hopwood <david.hopwood@...>:
> Mike Samuel wrote:
>> 2009/2/16 David-Sarah Hopwood <david.hopwood@...>
>>> Suppose that S is a Unicode string in which each character matches
>>> ValidChar below, not containing the subsequences "<!", "</" or "]]>", and
>>> not containing ("&" followed by a character not matching AmpFollower).
>>> S encodes a syntactically correct ES3 or ES3.1 source text chosen by
>>> an attacker.
>>>
>>> ValidChar :: one of
>>> '\u0009' '\u000A' '\u000D' // TAB, LF, CR
>>> [\u0020-\u007E]
>>> [\u00A0-\u00AC]
>>> [\u00AE-\u05FF]
>>> [\u0604-\u06DC]
>>> [\u06DE-\u070E]
>>> [\u0710-\u17B3]
>>> [\u17B6-\u200A]
>>> [\u2010-\u2027]
>>> [\u202F-\u205F]
>>> [\u2070-\uD7FF]
>>
>> So no surrogates?
>
> Correct. They're not characters (or even "noncharacters").
>
>>> [\uE000-\uFDCF]
>>> [\uFDF0-\uFEFE]
>>> [\uFF00-\uFFEF]
>>
>> Why include FFEF?
>
> It's unassigned, and there's no particular reason to exclude it.
> (\uFFF0-\uFFF8 are also unassigned, but that's an area reserved
> for "special" characters.)
Isn't it the reflection of fffe, the byte-order-marker.
This is probably a very minor issue, but if one part of a parser
naively delegates to another parser that mistakenly treats its content
as a byte string instead of code units, the presence of a BOM might
cause the delegatee to misinterpret content when something that looks
like a BOM appears at the beginning of a chunk of embedded language.
>>> AmpFollower :: one of
>>> '=' '(' '+' '-' '!' '~' '"' '/' [0-9]
>>> '\u0027' '\u005C' '\u0020' '\u0009' '\u000A' \u000D'
>>> // single quote, backslash, space, TAB, LF, CR
>>>
>>> (ValidChar excludes format control characters, and some other
>>> characters known to be mishandled by browsers. AmpFollower is
>>> intended to exclude characters that can start an entity reference.)
>>>
>>> S is inserted between "<script>" and "</script>" in a place where a
>>> <script> tag is allowed in an otherwise valid HTML document, or
>>> between "<script><![CDATA[" and "]]></script>" in a place where a
>>> <script> tag is allowed in an otherwise valid XHTML document.
>>> The HTML or XHTML document starts with a correct <!DOCTYPE or
>>> <?xml declaration respectively, and is encoded as well-formed
>>> UTF-8.
>>>
>>> Are these restrictions sufficient to ensure that the embedded
>>> script is interpreted as it would have been if referenced from
>>> an external file, foiling any attempts of browsers to collude
>>> with the attacker in misparsing it?
>>
>> You may still be subject to encoding attacks. I'm sure there are
>> valid scripts that look like UTF-7, so if the script appears in the
>> first 1024B, you might need to make sure it's preceded by a <meta>
>> element specifying an encoding, and/or use the XML prologue form that
>> specifies an encoding.
>
> Right; I covered that in a follow-up. Is including a UTF-8 BOM at the
> start sufficient for all browsers (that is, are there any browsers
> in which a <meta> tag or other content sniffing can override an
> explicit initial UTF-8 BOM, in either HTML or XHTML)?
Ah cool. I don't know the answer to that question.
> HTML5 section 8.2.2.1 seems to indicate that "if the transport layer
> specifies an encoding" (i.e. presumably the charset specified in
> a Content-Type header), then that should override a BOM. That's
> irritating, because it means that you have to assume that the server
> gets the Content-Type right, *as well as* including a BOM for the
> browsers in which Content-Type doesn't override sniffing
> (Internet Explorer, at least), and for the case where the document
> is read from a local file.
Yeah. I think the best thing to do is to use a fairly standard
encoding like UTF-8, and make sure the XML prologue, <meta
http-equiv="content-type">, and headers all agree.
I don't think that you can do much about file hosting services that go
out of their way to specify a whacky encoding. Putting a BOM at the
front will help hosting services that make a genuine effort.
> --
> David-Sarah Hopwood ⚥
>
>
Mike Samuel wrote:
> 2009/2/16 David-Sarah Hopwood <david.hopwood@...>
>> Suppose that S is a Unicode string in which each character matches
>> ValidChar below, not containing the subsequences "<!", "</" or "]]>", and
>> not containing ("&" followed by a character not matching AmpFollower).
>> S encodes a syntactically correct ES3 or ES3.1 source text chosen by
>> an attacker.
>>
>> ValidChar :: one of
>> '\u0009' '\u000A' '\u000D' // TAB, LF, CR
>> [\u0020-\u007E]
>> [\u00A0-\u00AC]
>> [\u00AE-\u05FF]
>> [\u0604-\u06DC]
>> [\u06DE-\u070E]
>> [\u0710-\u17B3]
>> [\u17B6-\u200A]
>> [\u2010-\u2027]
>> [\u202F-\u205F]
>> [\u2070-\uD7FF]
>
> So no surrogates?
Correct. They're not characters (or even "noncharacters").
>> [\uE000-\uFDCF]
>> [\uFDF0-\uFEFE]
>> [\uFF00-\uFFEF]
>
> Why include FFEF?
It's unassigned, and there's no particular reason to exclude it.
(\uFFF0-\uFFF8 are also unassigned, but that's an area reserved
for "special" characters.)
>> AmpFollower :: one of
>> '=' '(' '+' '-' '!' '~' '"' '/' [0-9]
>> '\u0027' '\u005C' '\u0020' '\u0009' '\u000A' \u000D'
>> // single quote, backslash, space, TAB, LF, CR
>>
>> (ValidChar excludes format control characters, and some other
>> characters known to be mishandled by browsers. AmpFollower is
>> intended to exclude characters that can start an entity reference.)
>>
>> S is inserted between "<script>" and "</script>" in a place where a
>> <script> tag is allowed in an otherwise valid HTML document, or
>> between "<script><![CDATA[" and "]]></script>" in a place where a
>> <script> tag is allowed in an otherwise valid XHTML document.
>> The HTML or XHTML document starts with a correct <!DOCTYPE or
>> <?xml declaration respectively, and is encoded as well-formed
>> UTF-8.
>>
>> Are these restrictions sufficient to ensure that the embedded
>> script is interpreted as it would have been if referenced from
>> an external file, foiling any attempts of browsers to collude
>> with the attacker in misparsing it?
>
> You may still be subject to encoding attacks. I'm sure there are
> valid scripts that look like UTF-7, so if the script appears in the
> first 1024B, you might need to make sure it's preceded by a <meta>
> element specifying an encoding, and/or use the XML prologue form that
> specifies an encoding.
Right; I covered that in a follow-up. Is including a UTF-8 BOM at the
start sufficient for all browsers (that is, are there any browsers
in which a <meta> tag or other content sniffing can override an
explicit initial UTF-8 BOM, in either HTML or XHTML)?
HTML5 section 8.2.2.1 seems to indicate that "if the transport layer
specifies an encoding" (i.e. presumably the charset specified in
a Content-Type header), then that should override a BOM. That's
irritating, because it means that you have to assume that the server
gets the Content-Type right, *as well as* including a BOM for the
browsers in which Content-Type doesn't override sniffing
(Internet Explorer, at least), and for the case where the document
is read from a local file.
--
David-Sarah Hopwood ⚥
2009/2/16 David-Sarah Hopwood <david.hopwood@...>
>
> Suppose that S is a Unicode string in which each character matches
> ValidChar below, not containing the subsequences "<!", "</" or "]]>", and
> not containing ("&" followed by a character not matching AmpFollower).
> S encodes a syntactically correct ES3 or ES3.1 source text chosen by
> an attacker.
>
> ValidChar :: one of
> '\u0009' '\u000A' '\u000D' // TAB, LF, CR
> [\u0020-\u007E]
> [\u00A0-\u00AC]
> [\u00AE-\u05FF]
> [\u0604-\u06DC]
> [\u06DE-\u070E]
> [\u0710-\u17B3]
> [\u17B6-\u200A]
> [\u2010-\u2027]
> [\u202F-\u205F]
> [\u2070-\uD7FF]
So no surrogates?
> [\uE000-\uFDCF]
> [\uFDF0-\uFEFE]
> [\uFF00-\uFFEF]
Why include FFEF?
> AmpFollower :: one of
> '=' '(' '+' '-' '!' '~' '"' '/' [0-9]
> '\u0027' '\u005C' '\u0020' '\u0009' '\u000A' \u000D'
> // single quote, backslash, space, TAB, LF, CR
>
> (ValidChar excludes format control characters, and some other
> characters known to be mishandled by browsers. AmpFollower is
> intended to exclude characters that can start an entity reference.)
>
> S is inserted between "<script>" and "</script>" in a place where a
> <script> tag is allowed in an otherwise valid HTML document, or
> between "<script><![CDATA[" and "]]></script>" in a place where a
> <script> tag is allowed in an otherwise valid XHTML document.
> The HTML or XHTML document starts with a correct <!DOCTYPE or
> <?xml declaration respectively, and is encoded as well-formed
> UTF-8.
>
> Are these restrictions sufficient to ensure that the embedded
> script is interpreted as it would have been if referenced from
> an external file, foiling any attempts of browsers to collude
> with the attacker in misparsing it?
You may still be subject to encoding attacks. I'm sure there are
valid scripts that look like UTF-7, so if the script appears in the
first 1024B, you might need to make sure it's preceded by a <meta>
element specifying an encoding, and/or use the XML prologue form that
specifies an encoding.
> Are some of the restrictions unnecessary?
>
> --
> David-Sarah Hopwood ⚥
No, I'm not paranoid enough yet. It's not sufficient only to say
that the HTML is encoded as UTF-8 (see below).
David-Sarah Hopwood wrote:
[...]
> The HTML or XHTML document starts with a correct <!DOCTYPE or
> <?xml declaration respectively,
I meant, the document starts with <!DOCTYPE HTML> in the case
of HTML, or <?xml version="1.0"?><!DOCTYPE HTML> in the case of
XHTML.
(This will also put the parser into sane^H^H^H^Hstandards mode.)
> and is encoded as well-formed UTF-8.
The document must also start with a UTF-8 BOM, *and* must not
contain a META directive that changes the charset, *and* in the
case of HTML, must either be retrieved from a local file or over
HTTP with the header "Content-Type: text/html; charset=UTF-8".
This is because the method of determining the encoding is chosen
based on the phase of the moon.
Any other problems?
--
David-Sarah Hopwood ⚥
Suppose that S is a Unicode string in which each character matches
ValidChar below, not containing the subsequences "<!", "</" or "]]>", and
not containing ("&" followed by a character not matching AmpFollower).
S encodes a syntactically correct ES3 or ES3.1 source text chosen by
an attacker.
ValidChar :: one of
'\u0009' '\u000A' '\u000D' // TAB, LF, CR
[\u0020-\u007E]
[\u00A0-\u00AC]
[\u00AE-\u05FF]
[\u0604-\u06DC]
[\u06DE-\u070E]
[\u0710-\u17B3]
[\u17B6-\u200A]
[\u2010-\u2027]
[\u202F-\u205F]
[\u2070-\uD7FF]
[\uE000-\uFDCF]
[\uFDF0-\uFEFE]
[\uFF00-\uFFEF]
AmpFollower :: one of
'=' '(' '+' '-' '!' '~' '"' '/' [0-9]
'\u0027' '\u005C' '\u0020' '\u0009' '\u000A' \u000D'
// single quote, backslash, space, TAB, LF, CR
(ValidChar excludes format control characters, and some other
characters known to be mishandled by browsers. AmpFollower is
intended to exclude characters that can start an entity reference.)
S is inserted between "<script>" and "</script>" in a place where a
<script> tag is allowed in an otherwise valid HTML document, or
between "<script><![CDATA[" and "]]></script>" in a place where a
<script> tag is allowed in an otherwise valid XHTML document.
The HTML or XHTML document starts with a correct <!DOCTYPE or
<?xml declaration respectively, and is encoded as well-formed
UTF-8.
Are these restrictions sufficient to ensure that the embedded
script is interpreted as it would have been if referenced from
an external file, foiling any attempts of browsers to collude
with the attacker in misparsing it?
Are some of the restrictions unnecessary?
--
David-Sarah Hopwood ⚥
On Feb 10, 2009, at 6:36 PM, Mike Samuel wrote:
> and there's the newlines in block comments thing return /*
> */ foo();
>
Fixed in Firefox 3.1 beta nightlies:
https://bugzilla.mozilla.org/show_bug.cgi?id=475834
We could push the fix back into a 3.0.x maintenance release if it
would help. Anyone with https://bugzilla.mozilla.org editbugs
permission who wants this, feel free to nominate the patch for approval.
/be
2009/2/10 David-Sarah Hopwood <david.hopwood@...>:
> Marcel Laverdet wrote:
>>
>> From what I remember this started out as a bug in IE and then Firefox
>> followed suit for compatibility which left the other browsers with no
>> choice. I can't find the original bug but `/[/]/` only started parsing
>> in FF1.5, in FF1.0 it would throw a syntax error.
>>
>> You could throw out any malformed regexp literals (any that differ
>> between ES3 \ ES3.1) which is a fairly small subset and you would be ok.
>
> I could, if I knew that there were no more bugs like this. Note that
> lexical confusion attacks of this kind can easily be turned into complete
> breaks of a subset implementation:
>
> [ /[/]/ /alert('toast')]/ + 1
>
> Verifier sees valid, harmless code:
> [ new RegExp("[") ] / new RegExp("alert('toast')]") + 1
>
> Browser runs exploit code:
> [ new RegExp("[\/]") / alert('toast') ] / +1
>
> Since there's no way that I could reliably have known about the IE lexer
> bug, it's just too risky.
>
> Anyone know of other bugs where common JS implementations lex or parse
> valid ES3 code with a different meaning than specified? (The only one
> I can think of right now is \v in IE, but at least that doesn't result
> in a parse with a different structure.)
Plenty. But I suspect you know of them. There's conditional
compilation comments /* @cc_on */,
and there's the newlines in block comments thing return /*
*/ foo();
and there's format control characters between pairs like */ and \".
There's other tricks you can do with \u escapes in identifiers and NUL
and BOM characters in source.
> --
> David-Sarah Hopwood ⚥
On Feb 10, 2009, at 6:34 AM, David-Sarah Hopwood wrote:
> Brendan Eich wrote:
> > On Feb 9, 2009, at 9:42 AM, Marcel Laverdet wrote:
> >
> >> From what I remember this started out as a bug in IE and then
> >> Firefox followed suit for compatibility which left the other
> >> browsers with no choice.
> >
> > No, other browsers followed suit first.
> >
> >> I can't find the original bug but `/[/]/` only started parsing in
> >> FF1.5, in FF1.0 it would throw a syntax error.
> >
> > https://bugzilla.mozilla.org/show_bug.cgi?id=309840
>
> <https://bugzilla.mozilla.org/show_bug.cgi?id=309840#c12>
>
> # This fixes a highly dup'ed IE compatibility bug. It's an extension
> # to ECMA syntax that's allowed by Section 16. I'm approving it so
> # that we can get it into 1.8b5 / Firefox 1.5b2.
>
> As the example in my first post demonstrated, it is absolutely not
> correct that this was an allowed Section 16 extension.
>
You're right, but so what? The IE bug and monopoly combined to create
a de-facto standard. Appealing to the de-jure standard does you no
good, and correcting my 2005-ear misunderstanding (you've corrected it
more recently in es-discuss) does not change the de-facto standard
trumping the de-jure one.
> In fact this just makes me even more worried: it seems that Section 16
> is being misinterpreted in a way that prevents independently developed
> parsers, implemented strictly from the spec, from being able to
> match the
> parsing behaviour of browsers on syntactically valid ES3 code. Is this
> just a one-off mistake, or is Section 16 consistently being
> interpreted
> too loosely?
>
This has nothing to do with Section 16 or my former misunderstanding
of it, and everything to do with IE forcing a de-facto standard. As
far as I know, no one at Microsoft added the bug allowing unescaped /
in a character class by arguing based on a misinterpretation of
Section 16. I think you are barking up the wrong tree.
/be
>
>
> --
> David-Sarah Hopwood ⚥
>
>
>
Douglas Crockford wrote:
> David-Sarah Hopwood wrote:
>> Consider the following JavaScript source:
>>
>> [ /[/]/ /foo]/ + bar
>>
>> According to the ES3 spec, this is interpreted as:
>>
>> [ new RegExp("[") ] / new RegExp("foo]") + bar
>>
>> According to the ES3.1 draft spec, it is interpreted as:
>>
>> [ new RegExp("[\/]") / foo ] / +bar
>>
>> Apparently, Firefox and IE7 were lexing regexp literals in the way
>> ES3.1 specifies. I had considered re-allowing regexp literals in
>> Jacaranda 0.4, but this has scared me off doing so -- the potential
>> for lexical confusion attacks is just too great.
>
> ADsafe rejects [ /[/]/ /foo]/ + bar. Just because ECMAScript says its ok
doesn't
> mean that ADsafe must. ADsafe insists that all internal / must have \.
I'm confused -- how does it know that the middle '/' in "/[/]/" is
"internal"? Is it lexing according to the intersection of Pattern
from section 15.10.1, and RegularExpressionBody?
--
David-Sarah Hopwood ⚥