From an article in "SOA Advisor" titled "Enterprise Web 2.0, SOA Linkage: Will lack of standards be a hindrance?" by Srinivas Padmanabhuni of InfoSys. (If you...
... Maybe I'm being horribly unfair to protocol designers, but implementors do. An example is entities in URIs embedded in HTML. <a href="foo?bar=a&baz=b"> is...
On standards: The benefit of HTTP and XML and HTML is not that they are well-designed protocol and syntax and language, but that there are many different and...
Ok. I think the time for debate has passed, but it's a slow Monday so I'll bite :) There's a few problems: (1) Documents embed other documents using a melange...
... Okay, I'll try to say the obvious here -- although no one individual is responsible, we find ourselves in the middle of a big hacked-up pile of conventions...
... Ok. I think it's useful to make a distinction between the n:1 mappings and the 1:1 mappings. If you're escaping (which I defined as n:1), you have to...
To answer your direct questions: I don't know any formal definition for "escaping" except as a part of "encoding" -- you encode a sequence of bytes into (a...
... I still don't understand. My reading of the spec says that the first sequence of characters is in ASCII. If that's the case, then an HTML validator should...
117
David Hopwood
david.hopwood@...
Oct 20, 2007 4:34 am
... URIs are sequences of characters that encode a sequence of bytes, which *may* in turn encode a sequence of Unicode characters. For URIs that have some...
... It seems to be accepting lots of invalid HTML. For example, the simple <iframe xx="yy"></iframe> seems to pass, whereas http://validator.w3.org/check...
Sorry. I was reading 2396 (not 3986) which says An escaped octet is encoded as a character triplet, consisting of the percent character "%" followed by the...
I think you got it backward: URIs are sequences of characters, not bytes. and in (X)HTML, "URI" is really "IRI" – the XHTML spec allows full Unicode (10646)...
One simple way to approximate this (if you didn't want to reuse someone else's code for validating HTML) would be to serialize your parsed HTML back to an...
Why is ADsafe allowing invalid HTML at all? It seems like requiring the HTML to be well-formed is a good first step in trying to understand how it will be...
... The set of HTML confusions is vast, but not infinite. An advantage here is that JSLint/ADsafe does not have to pass all valid HTML. I can be semidraconian...
... There are two problems here: (1) Identifying a safe subset of HTML/CSS and Javascript -- without obscure extensions like expression() (2) The other is...
The read-only aspect of JSLint is fairly unique and makes it somewhat more useful for certain applications. I support having a tool that does rewriting as an...
106
David Hopwood
david.hopwood@...
Oct 19, 2007 2:51 am
... The most common approach to preventing XSS attacks in user-generated content is not to allow HTML in that content, but to translate some simpler mark-up ...
It's tough to write a useful application for a browser if you can't manipulate html. On 18/10/2007, David Hopwood <david.hopwood@...>...
104
David Hopwood
david.hopwood@...
Oct 19, 2007 12:42 am
... The diversity of possible attacks on HTML, and the difficulty in keeping up with any changes in browsers, suggests to me that it may be a better idea...
RFC 3986 disallows the null byte in URIs, and says URIs are sequences of bytes, not characters, so 65533 is out of range. In your attribute whitelist, can't...
I'm not pasting. I'm reading the value of a textarea into JSLint directly using JavaScript. See http://crypto.stanford.edu/jsonrequest/nullbyte2.html It looks...
... I scan every line for null and other characters. I am guessing that the null is lost in the browser's paste process. In production, inspection will be done...
Null byte between "java" and "script" passes JSLint on Firefox despite being an attack on IE: <iframe src="java65533;script:alert(42)"></iframe> Also: ...