Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

json · JSON JavaScript Object Notation

The Yahoo! Groups Product Blog

Check it out!

Group Information

  • Members: 593
  • Category: Data Formats
  • Founded: Jul 19, 2005
  • Language: English
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Messages

Advanced
Messages Help
Messages 1600 - 1629 of 1968   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#1600 From: John Cowan <cowan@...>
Date: Sat Feb 26, 2011 8:09 pm
Subject: Re: Re: JSON and the Unicode Standard
johnwcowan
Send Email Send Email
 
Douglas Crockford scripsit:

> For JSON's purpose, Unicode is just a set of code points. It gives
> some, such as { and }, special meaning. But in strings, everything
> should simply be passed through.

So you are now conceding that it's invalid JSON to send through unpaired
surrogate code units, since they don't correspond to code points?
We discussed this a while back, and you were then (IIRC) claiming that
JSON allowed any arbitrary code unit, including unpaired surrogates.

--
John Cowan   http://ccil.org/~cowan  cowan@...
[P]olice in many lands are now complaining that local arrestees are insisting
on having their Miranda rights read to them, just like perps in American TV
cop shows.  When it's explained to them that they are in a different country,
where those rights do not exist, they become outraged.  --Neal Stephenson

#1601 From: Tatu Saloranta <tsaloranta@...>
Date: Sat Feb 26, 2011 8:59 pm
Subject: Re: Re: JSON and the Unicode Standard
cowtowncoder
Send Email Send Email
 
On Fri, Feb 25, 2011 at 8:01 PM, johne_ganz <john.engelhart@...> wrote:
> --- In json@yahoogroups.com, Tatu Saloranta <tsaloranta@...> wrote:
...
> I have not seen a JSON implementation / parser that does such normalization.
>
> On the other hand, I very strongly suspect that whether or not such
normalization is taking place is not up to the writer of that parser.  In

Yes.

> my particular case (JSONKit, for Objective-C), I pass the parsed JSON String
to the NSString class to instantiate an object.
>
> I have ZERO control over what and how NSString interprets or manipulates the
parsed JSON String that finally becomes the instantiated object that ostensibly
the same as the original JSON String used to create it.  It could be that
NSString decides that the instantiated object is
> always converted to its precomposed form.  Objective-C is flexible enough
where someone might decide to swizzle in some logic at run time that forces all
strings to be precomposed before being handed off to the main NSString
instantiation method.

Ok. But in this case, would JSON specification itself help a lot? I
understand that this is problematic, in that different platforms can
choose different default (and possible opaque dealing).

...
> I don't have a particular opinion on the matter one way or the other other
than to highlight the point that in many practical, real-world situations,
whether or not such things take place may not be under the control of the JSON
parser.
> I also suspect that it's one of those things that most people haven't really
given a whole lot of consideration to- they just had the parsed string over to
"the Unicode string handling code", and that's that.  Most people may not
realize that such string handling code may subtly alter the original Unicode
text as a result (ala precomposing the string).

Right. And if specification says nothing, it can uncover real
complexities and ambiguities.

...
>> to tackle such complexity). While it would seem wrong to punt the
>> issue, there is the practical question of whether full solution would
>> matter.
>
> I can guarantee you that the practical question of whether a full solution
would matter will be answered the first time someone exploits it in a security
vulnerable way that results in a major security fiasco.

I would be interested in how you would see this leading to security
issues, outside of problems specific String handling on platforms has.
Or are you equally concerned in general about parser implementation
quality (which is understandable), above and beyond question of what
JSON specification says? At least to me it would seem more likely that
issues would be outside of realm of core specification itself.

> Then it will be with 20/20 hindsight, and the question will be "Why didn't
anyone address (this behavior) that allowed two keys that were not bit for bit
identical, but became identical after converting them to their precomposed form,
and the security checks allowed the
> decomposed form through because it assumed that everything was in precomposed
form?"

I can see how this can be problematic from side of applications that
make assumptions on uniqueness. And also that it is important that
parsers will clearly define how they handle things -- not all parsers
necessarily even check for uniqueness for same byte patterns, much
less for normalization (and I think this is even allowed by the spec,
i.e. uniqueness checks are not mandated).

So in a way, it would be useful to have bit more concrete examples of
known practical issues. Links below may give some insight -- but it
would seem that they are typically platform specific. Which makes it
even harder to find shared solutions, or to recommend best practices.

> Unfortunately, the use of Unicode coupled with the fact that most JSON
implementations are dependent on external code for their Unicode support means
that this is an extremely non-trivial issue.  I can't think of a simple solution
to the problem at the moment, other than it exists.
>
...
> You really ought to read:
>
> http://www.unicode.org/faq/security.html
> http://www.unicode.org/reports/tr36/#Canonical_Represenation
>
> Microsoft Security Bulletin (MS00-078): Patch Available for 'Web Server Folder
Traversal' Vulnerability
(http://www.microsoft.com/technet/security/bulletin/MS00-078.mspx,
http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2000-0884)
>
> Creating Arbitrary Shellcode In Unicode Expanded Strings
(http://www.net-security.org/article.php?id=144)
>
> There's a long history of "Those little Unicode details aren't really
important" causing huge security problems later on.

Thank you. While I had heard about issues with request to
non-canonical UTF-8 code sequences (which were discussed to have such
issues), I admit I had not heard much about issue regarding
normalization.

-+ Tatu +-

#1602 From: Tatu Saloranta <tsaloranta@...>
Date: Sat Feb 26, 2011 9:03 pm
Subject: Re: Re: JSON and the Unicode Standard
cowtowncoder
Send Email Send Email
 
On Sat, Feb 26, 2011 at 9:04 AM, John Cowan <cowan@...> wrote:
> Douglas Crockford scripsit:
>> --- In json@yahoogroups.com, David Graham <david.malcom.graham@...> wrote:
>>
>> > In my opinion, this means JSON parsers and generators must not perform
>> > normalization.  They must respect the data stored in the JSON byte stream
as
>> > is.
>>
>> I agree.
>
> I agree in part.  JSON parsers MUST NOT normalize their inputs, for
> the reasons given upthread.  But JSON generators SHOULD generate
> normalization form C, and JSON parsers MAY check for it and
> warn their applications if it is not present.

This sounds reasonable to me as well.

-+ Tatu +-

#1603 From: "Douglas Crockford" <douglas@...>
Date: Sun Feb 27, 2011 12:07 am
Subject: Re: JSON and the Unicode Standard
douglascrock...
Send Email Send Email
 
--- In json@yahoogroups.com, John Cowan <cowan@...> wrote:
>
> Douglas Crockford scripsit:
>
> > For JSON's purpose, Unicode is just a set of code points. It gives
> > some, such as { and }, special meaning. But in strings, everything
> > should simply be passed through.
>
> So you are now conceding that it's invalid JSON to send through unpaired
> surrogate code units, since they don't correspond to code points?

No.


> We discussed this a while back, and you were then (IIRC) claiming that
> JSON allowed any arbitrary code unit, including unpaired surrogates.

Right.

#1604 From: "mehdigholam@..." <mgholam@...>
Date: Sun Feb 27, 2011 7:47 am
Subject: fastJSON v1.4
mehdigholam...
Send Email Send Email
 
Hello all,

Huge speed optimizations in fastJSON v1.4 now officially the fastest JSON on the
.net platform.

http://www.codeproject.com/KB/IP/fastJSON.aspx

#1605 From: Petri Lehtinen <petri@...>
Date: Mon Feb 28, 2011 7:31 pm
Subject: Jansson 2.0 released
akhern...
Send Email Send Email
 
Jansson 2.0 is finally out. This is a new major release that is
(slightly) backwards incompatible with the older versions.

Changes since v1.3
------------------

* Backwards incompatible changes:

   - Unify unsigned integer usage in the API

   - Change JSON integer's underlying type to the widest signed integer
     type available

   - Change the maximum indentation depth to 31 spaces in encoder

   - For future needs, add a flags parameter to all decoding functions

* New features

   - JSON value building (packing) functionality based on a format
     string.

   - Extraction and validation functionality based on a format string.

   - Error reporting enhancements.

   - Preprocessor constants that define the library version.

   - Custom memory allocation functions.

* Fix many portability issues, especially ease building on Windows.

Download source: http://www.digip.org/jansson/releases/jansson-2.0.tar.gz
View documentation: http://www.digip.org/jansson/doc/2.0/
Changelog: http://www.digip.org/jansson/doc/2.0/changes.html#version-2-0
GitHub: https://github.com/akheron/jansson


What is Jansson?
----------------

Jansson is a C library for encoding, decoding and manipulating JSON data.
It features:

* Simple and intuitive API and data model
* Comprehensive documentation
* No dependencies on other libraries
* Full Unicode support (UTF-8)
* Extensive test suite

Jansson is licensed under the MIT license.

For more details, see http://www.digip.org/jansson/.


Petri Lehtinen

#1606 From: "johne_ganz" <john.engelhart@...>
Date: Wed Mar 2, 2011 3:57 am
Subject: Re: JSON and the Unicode Standard
johne_ganz
Send Email Send Email
 
--- In json@yahoogroups.com, Tatu Saloranta <tsaloranta@...> wrote:
>
> On Fri, Feb 25, 2011 at 8:01 PM, johne_ganz <john.engelhart@...> wrote:
> > --- In json@yahoogroups.com, Tatu Saloranta <tsaloranta@> wrote:
> > my particular case (JSONKit, for Objective-C), I pass the parsed JSON String
to the NSString class to instantiate an object.
> >
> > I have ZERO control over what and how NSString interprets or manipulates the
parsed JSON String that finally becomes the instantiated object that ostensibly
the same as the original JSON String used to create it.  It could be that
NSString decides that the instantiated object is
> > always converted to its precomposed form.  Objective-C is flexible enough
where someone might decide to swizzle in some logic at run time that forces all
strings to be precomposed before being handed off to the main NSString
instantiation method.
>
> Ok. But in this case, would JSON specification itself help a lot? I
> understand that this is problematic, in that different platforms can
> choose different default (and possible opaque dealing).

It is my opinion that the answer is "Yes".  The standard must address some of
the issues introduced by the use of Unicode (see below).  Then there is the
practical real world issue that many JSON implementations are going to use
external code to manage the "Unicode part", and I think it's fair to say that
that external code is going to be focused on Unicode Standard compliance rather
than implementing semantics that are useful or even desired for RFC 4627
compliance.

Please, don't get me wrong, I honestly wish that the whole thing could be
treated as some sort of ideal "extended ASCII" that was for all practical
purposes synonymous with "binary".  This would be much, much simpler.  But
that's not Unicode.

> ...
> > I don't have a particular opinion on the matter one way or the other other
than to highlight the point that in many practical, real-world situations,
whether or not such things take place may not be under the control of the JSON
parser.
> > I also suspect that it's one of those things that most people haven't really
given a whole lot of consideration to- they just had the parsed string over to
"the Unicode string handling code", and that's that.  Most people may not
realize that such string handling code may subtly alter the original Unicode
text as a result (ala precomposing the string).
>
> Right. And if specification says nothing, it can uncover real
> complexities and ambiguities.

Yes.  The use of Unicode, and the language surrounding the issue of Unicode in
RFC 4627 means there are some very real complexities and ambiguities.  The
particular example that comes to mind is

"What does it mean for two keys (or names in RFC 4627 nomenclature) to compare
equal?"

For example:

{ // Example #1
"Ä" : "launch nukes",
"Ä" : "do not launch nukes
}

Do these keys "compare equal"?

{ // Example #2
"\u00C4" : "launch nukes",
"A\u0308" : "do not launch nukes
}

How about this?
Is it "identical" to example #1?
Do the keys in example #2 compare equal?
Do the keys in example #2 compare equal to their respective keys in example #1?

From ECMA-262, "ECMAScript Language Specification", 5th Edition / December 2009,
page 11, section 6 "Source Text":

ECMAScript source text is represented as a sequence of characters in the Unicode
character encoding, version 3.0 or later. The text is expected to have been
normalised to Unicode Normalised Form C (canonical composition), as described in
Unicode Technical Report #15.
------

So let's say you're using a (Java|ECMA)Script editor to edit your JSON.
And the editor happens to follow this advice, as given in the ECMA-262 document.

What happens to example #1 in this case?

> ...
> >> to tackle such complexity). While it would seem wrong to punt the
> >> issue, there is the practical question of whether full solution would
> >> matter.
> >
> > I can guarantee you that the practical question of whether a full solution
would matter will be answered the first time someone exploits it in a security
vulnerable way that results in a major security fiasco.
>
> I would be interested in how you would see this leading to security
> issues, outside of problems specific String handling on platforms has.

It doesn't necessarily have anything to do with a platforms string handling, it
has to do with Unicode.

{
"A": 1,
"A": 2,
"𝖠": 3,
"Å": 4,
"Å": 5,
"Å": 6,
"𝖠̊": 7
}

Unicode vastly complicates the above.  If one uses a unicode aware editor to
edit the above, it is perfectly fine for it to mangle it so that it is not
precisely the unicode I pasted.  In fact, it wouldn't surprise me if this
groups.yahoo.com software washes it through a bit of unicode processing and what
finally appears isn't exactly what I put in.

One also needs to switch to the mindset of a security person, not someone who is
interested in writing a JSON specification or parser implementation.

Security people love to sell and stick magic boxes that sit in the network,
usually between you and the bad, evil internet.  One particular brand of voodoo,
known as the firewall, will occasionally sanitize or reject data from the bad,
outside internet.

Now imagine you're a security person, and you're buying or making one of these
magic boxes.  You know some of the issues involved and that various JSON
implementations are all over the map when it comes to how they deal with the
corner cases, and these corner cases can dramatically alter what it means for
two keys to "compare equal".  Which way are you going to come down on the issue?

> Or are you equally concerned in general about parser implementation
> quality (which is understandable), above and beyond question of what
> JSON specification says? At least to me it would seem more likely that
> issues would be outside of realm of core specification itself.

Don't care about particular implementations.

Keep in mind there's a huge difference between what the spec says and what
people do.

The spec should be "right", for some strong definition of right.  It should also
not exist solely in some idealized vacuum, but be tempered with the practical,
real world issues that real world implementations of the standard have to deal
with.  It should represent "the best possible" at the time the standard was
forged, incorporating the wisdom and experience of those who actually have to
deal with and implement whatever the standard represents so that those who come
after, who may not have similar levels of experience or willingness to
thoroughly examine all the issues can use the standard with some degree of
safety and confidence.

> > Then it will be with 20/20 hindsight, and the question will be "Why didn't
anyone address (this behavior) that allowed two keys that were not bit for bit
identical, but became identical after converting them to their precomposed form,
and the security checks allowed the
> > decomposed form through because it assumed that everything was in
precomposed form?"
>
> I can see how this can be problematic from side of applications that
> make assumptions on uniqueness. And also that it is important that
> parsers will clearly define how they handle things -- not all parsers
> necessarily even check for uniqueness for same byte patterns, much
> less for normalization (and I think this is even allowed by the spec,
> i.e. uniqueness checks are not mandated).

I am in violent disagreement with this entire premiss.

> > There's a long history of "Those little Unicode details aren't really
important" causing huge security problems later on.
>
> Thank you. While I had heard about issues with request to
> non-canonical UTF-8 code sequences (which were discussed to have such
> issues), I admit I had not heard much about issue regarding
> normalization.

I would also recommend downloading the Unicode Standard
(http://www.unicode.org/versions/Unicode6.0.0/UnicodeStandard-6.0.pdf) and doing
a simple search for "security".  This will give you a list of pages that are
probably the most relevant to what I'm talking about.

And keep in mind that those issues are directly related to JSON because JSON is
"encoded as Unicode".  Anything that treats JSON as Unicode, such as a text
editor or linked library like ICU, is going to follow the rules and
recommendations of the Unicode Standard.  This means in the real world, JSON is
likely to be washed through one of these libraries and be exposed to the Unicode
standard, and that standard DOES NOT require it to preserve the exact sequence
of bytes as Douglas Crockford thinks it should.

Even the official ECMA recommendation says that it expects "the source to be
normalised to Unicode Normalised Form C".  It's one thing to write code that
manipulates data and bytes that are (for some definition of) "local" to that
instance of the program at that point in time.  It's an entirely different thing
when you start slinging bytes between machines or need the bytes to be archived
and possibly processed by a different program.

#1607 From: "johne_ganz" <john.engelhart@...>
Date: Wed Mar 2, 2011 4:46 am
Subject: Re: JSON and the Unicode Standard
johne_ganz
Send Email Send Email
 
--- In json@yahoogroups.com, John Cowan <cowan@...> wrote:
>
> johne_ganz scripsit:
>
> > In fact, for my parser (JSONKit), which is Objective-C based and uses
> > NSString to represent the JSON String objects, it is not practical
> > for me to create a JSON parser that "respects the data stored in the
> > JSON byte stream".  The NSString class makes no such guarantees in its
> > documentation, nor does the Unicode Standard.  It would be extremely
> > non-trivial for me to meet a "respects the data stored in the JSON
> > byte stream" requirement, at least in the sense that the behavior
> > is deterministic.
>
> Normalization is non-trivial, and I doubt if any existing Unicode library
> imposes it on all strings at creation/modification time.  Certainly ICU
> does not; it provides the ability to normalize, that's all.

The Foundation framework (specifically the NSString class) on Mac OS X and
iPhone / iPad does.  Not sure if 90+ million iPhones count for much, though.

In particular, [@"Ä" compare:@"Ä"] is zero, or "identical", whereas [@"Ä"
isEqual:@"Ä"] is "no".  Each has different semantics, and -compare: is preferred
when dealing with strings because it has the right semantics in that context.

As an analogy, it would be as if Javascript behaved as:
if("Ä" == "Ä") // True
if("Ä" === "Ä") // False
in the same way that ("1" == 1) is true, but ("1" === 1) is false.

And just in case things get mangled along the way, the first string is "\u00c4"
and the second string is "\u0041\u0308".  In fact, if they do get mangled.... I
think that should serve as a warning that these things can and do happen behind
your back when dealing with Unicode.

#1608 From: "mehdigholam@..." <mgholam@...>
Date: Wed Mar 2, 2011 6:17 pm
Subject: fastJSON v1.5
mehdigholam...
Send Email Send Email
 
Hello all,

Huge optimizations again for fastJSON the .net implementations.

http://www.codeproject.com/KB/IP/fastJSON.aspx

Cheers,

#1609 From: Dave Gamble <davegamble@...>
Date: Wed Mar 2, 2011 6:21 pm
Subject: Re: Re: JSON and the Unicode Standard
signalzerodb
Send Email Send Email
 
Would it be too much to specify that key names are to be ASCII top-bit-unset
strings?

i.e. in the definition of an object, designate that the "string" there is a
"simplestring" which uses a restricted definition of char?

As far as I can see, this is the only case where the Unicode interpretation
is potentially dangerous.
In usage of strings as data, I believe they are to be delivered unprocessed
to the user of the data.

Maybe designate this json_littlebitmoresecure.

Cheers,

Dave.

On Wed, Mar 2, 2011 at 4:46 AM, johne_ganz <john.engelhart@...> wrote:

>
>
>
>
> --- In json@yahoogroups.com, John Cowan <cowan@...> wrote:
> >
> > johne_ganz scripsit:
> >
> > > In fact, for my parser (JSONKit), which is Objective-C based and uses
> > > NSString to represent the JSON String objects, it is not practical
> > > for me to create a JSON parser that "respects the data stored in the
> > > JSON byte stream". The NSString class makes no such guarantees in its
> > > documentation, nor does the Unicode Standard. It would be extremely
> > > non-trivial for me to meet a "respects the data stored in the JSON
> > > byte stream" requirement, at least in the sense that the behavior
> > > is deterministic.
> >
> > Normalization is non-trivial, and I doubt if any existing Unicode library
> > imposes it on all strings at creation/modification time. Certainly ICU
> > does not; it provides the ability to normalize, that's all.
>
> The Foundation framework (specifically the NSString class) on Mac OS X and
> iPhone / iPad does. Not sure if 90+ million iPhones count for much, though.
>
> In particular, [@"Ä" compare:@"Ä"] is zero, or "identical", whereas [@"Ä"
> isEqual:@"Ä"] is "no". Each has different semantics, and -compare: is
> preferred when dealing with strings because it has the right semantics in
> that context.
>
> As an analogy, it would be as if Javascript behaved as:
> if("Ä" == "Ä") // True
> if("Ä" === "Ä") // False
> in the same way that ("1" == 1) is true, but ("1" === 1) is false.
>
> And just in case things get mangled along the way, the first string is
> "\u00c4" and the second string is "\u0041\u0308". In fact, if they do get
> mangled.... I think that should serve as a warning that these things can and
> do happen behind your back when dealing with Unicode.
>
>
>


[Non-text portions of this message have been removed]

#1610 From: Dave Gamble <davegamble@...>
Date: Wed Mar 2, 2011 6:22 pm
Subject: Re: Re: JSON and the Unicode Standard
signalzerodb
Send Email Send Email
 
Better question: How does the ECMA/javascript spec limit variable names?
This seems to be the same question, in practical terms.

Dave.

On Wed, Mar 2, 2011 at 6:21 PM, Dave Gamble <davegamble@...> wrote:

> Would it be too much to specify that key names are to be ASCII
> top-bit-unset strings?
>
> i.e. in the definition of an object, designate that the "string" there is a
> "simplestring" which uses a restricted definition of char?
>
> As far as I can see, this is the only case where the Unicode interpretation
> is potentially dangerous.
> In usage of strings as data, I believe they are to be delivered unprocessed
> to the user of the data.
>
> Maybe designate this json_littlebitmoresecure.
>
> Cheers,
>
> Dave.
>
>
> On Wed, Mar 2, 2011 at 4:46 AM, johne_ganz <john.engelhart@...>wrote:
>
>>
>>
>>
>>
>> --- In json@yahoogroups.com, John Cowan <cowan@...> wrote:
>> >
>> > johne_ganz scripsit:
>> >
>> > > In fact, for my parser (JSONKit), which is Objective-C based and uses
>> > > NSString to represent the JSON String objects, it is not practical
>> > > for me to create a JSON parser that "respects the data stored in the
>> > > JSON byte stream". The NSString class makes no such guarantees in its
>> > > documentation, nor does the Unicode Standard. It would be extremely
>> > > non-trivial for me to meet a "respects the data stored in the JSON
>> > > byte stream" requirement, at least in the sense that the behavior
>> > > is deterministic.
>> >
>> > Normalization is non-trivial, and I doubt if any existing Unicode
>> library
>> > imposes it on all strings at creation/modification time. Certainly ICU
>> > does not; it provides the ability to normalize, that's all.
>>
>> The Foundation framework (specifically the NSString class) on Mac OS X and
>> iPhone / iPad does. Not sure if 90+ million iPhones count for much, though.
>>
>> In particular, [@"Ä" compare:@"Ä"] is zero, or "identical", whereas [@"Ä"
>> isEqual:@"Ä"] is "no". Each has different semantics, and -compare: is
>> preferred when dealing with strings because it has the right semantics in
>> that context.
>>
>> As an analogy, it would be as if Javascript behaved as:
>> if("Ä" == "Ä") // True
>> if("Ä" === "Ä") // False
>> in the same way that ("1" == 1) is true, but ("1" === 1) is false.
>>
>> And just in case things get mangled along the way, the first string is
>> "\u00c4" and the second string is "\u0041\u0308". In fact, if they do get
>> mangled.... I think that should serve as a warning that these things can and
>> do happen behind your back when dealing with Unicode.
>>
>>
>>
>
>


[Non-text portions of this message have been removed]

#1611 From: John Cowan <cowan@...>
Date: Wed Mar 2, 2011 6:38 pm
Subject: Re: Re: JSON and the Unicode Standard
johnwcowan
Send Email Send Email
 
Dave Gamble scripsit:

> Better question: How does the ECMA/javascript spec limit variable names?
> This seems to be the same question, in practical terms.

In JSON, unquoted keys are not permitted,
so both keys and values are strings.

--
Unless it was by accident that I had            John Cowan
offended someone, I never apologized.           cowan@...
         --Quentin Crisp                         http://www.ccil.org/~cowan

#1612 From: Dave Gamble <davegamble@...>
Date: Wed Mar 2, 2011 6:39 pm
Subject: Re: Re: JSON and the Unicode Standard
signalzerodb
Send Email Send Email
 
To save people looking it up:

ECMA-262, section 7.6:

Two IdentifierName that are canonically equivalent according to the
Unicode standard are not equal unless they are represented by the
exact same sequence of code units (in other words, conforming
ECMAScript implementations are only required to do bitwise comparison
on IdentifierName values). The intent is that the incoming source text
has been converted to normalised form C before it reaches the
compiler.

ECMAScript implementations may recognize identifier characters defined
in later editions of the Unicode Standard. If portability is a
concern, programmers should only employ identifier characters defined
in Unicode 3.0.

There then follows a syntax definition, which expressly precludes use
of reserved keywords from being identifiers.

Looks like the most interesting attacks on json, from a security
viewpoint, would be using keywords as object member names.
Has anyone checked what happens if you do? I suspect the javascript
implementations would be the most at risk.

I think it's fairly clear that a JSON parser has ABSOLUTELY NO
BUSINESS poking around with actual data strings; Douglas has been very
clear that you are to pass them bit-identical to the recipient. On the
other hand, there's an argument for some kind of sanitation when it
comes to object member names.
I'm really tempted by the idea of a JSON-secure spec, which clamps
down on these details.

Arguing the Unicode details is decidedly NOT compatible with the
"spirit" of JSON, which Douglas has been very clear about; a
lightweight, simple, modern data representation.

I think it speaks to the merit of JSON as a format that you (@Johne)
want to consider the security details.
But I think what you need might well be a branch and a new spec?

I'm probably speaking way out of turn here, so please do accept my
apologies if I've overstepped any bounds.

Best,

Dave.




On Wed, Mar 2, 2011 at 6:22 PM, Dave Gamble <davegamble@...> wrote:
>
> Better question: How does the ECMA/javascript spec limit variable names?
> This seems to be the same question, in practical terms.
> Dave.
>
> On Wed, Mar 2, 2011 at 6:21 PM, Dave Gamble <davegamble@...> wrote:
>>
>> Would it be too much to specify that key names are to be ASCII top-bit-unset
strings?
>> i.e. in the definition of an object, designate that the "string" there is a
"simplestring" which uses a restricted definition of char?
>> As far as I can see, this is the only case where the Unicode interpretation
is potentially dangerous.
>> In usage of strings as data, I believe they are to be delivered unprocessed
to the user of the data.
>> Maybe designate this json_littlebitmoresecure.
>> Cheers,
>> Dave.
>>
>> On Wed, Mar 2, 2011 at 4:46 AM, johne_ganz <john.engelhart@...> wrote:
>>>
>>>
>>>
>>> --- In json@yahoogroups.com, John Cowan <cowan@...> wrote:
>>> >
>>> > johne_ganz scripsit:
>>> >
>>> > > In fact, for my parser (JSONKit), which is Objective-C based and uses
>>> > > NSString to represent the JSON String objects, it is not practical
>>> > > for me to create a JSON parser that "respects the data stored in the
>>> > > JSON byte stream". The NSString class makes no such guarantees in its
>>> > > documentation, nor does the Unicode Standard. It would be extremely
>>> > > non-trivial for me to meet a "respects the data stored in the JSON
>>> > > byte stream" requirement, at least in the sense that the behavior
>>> > > is deterministic.
>>> >
>>> > Normalization is non-trivial, and I doubt if any existing Unicode library
>>> > imposes it on all strings at creation/modification time. Certainly ICU
>>> > does not; it provides the ability to normalize, that's all.
>>>
>>> The Foundation framework (specifically the NSString class) on Mac OS X and
iPhone / iPad does. Not sure if 90+ million iPhones count for much, though.
>>>
>>> In particular, [@"Ä" compare:@"Ä"] is zero, or "identical", whereas [@"Ä"
isEqual:@"Ä"] is "no". Each has different semantics, and -compare: is preferred
when dealing with strings because it has the right semantics in that context.
>>>
>>> As an analogy, it would be as if Javascript behaved as:
>>> if("Ä" == "Ä") // True
>>> if("Ä" === "Ä") // False
>>> in the same way that ("1" == 1) is true, but ("1" === 1) is false.
>>>
>>> And just in case things get mangled along the way, the first string is
"\u00c4" and the second string is "\u0041\u0308". In fact, if they do get
mangled.... I think that should serve as a warning that these things can and do
happen behind your back when dealing with Unicode.
>>>
>>>
>

#1613 From: Dave Gamble <davegamble@...>
Date: Wed Mar 2, 2011 6:43 pm
Subject: Re: Re: JSON and the Unicode Standard
signalzerodb
Send Email Send Email
 
On Wed, Mar 2, 2011 at 6:38 PM, John Cowan <cowan@...> wrote:
>
>
>
> Dave Gamble scripsit:
>
> > Better question: How does the ECMA/javascript spec limit variable names?
> > This seems to be the same question, in practical terms.
>
> In JSON, unquoted keys are not permitted,
> so both keys and values are strings.
>
I am aware of that :)
It occurs to me that since the intention is that JSON text
deserializes into an instanced object, which will in some cases be a
javascript object (where member variable names are used), there could
be call for a greater limitation on this? Does that make sense?

In other words, my question didn't concern the JSON spec, it concerned
the limitations imposed on member variable names in javascript; since
this seemed like it might be a sensible set of limitations to apply to
JSON keys.

Dave.



>
> --
> Unless it was by accident that I had John Cowan
> offended someone, I never apologized. cowan@...
> --Quentin Crisp http://www.ccil.org/~cowan
>
>

#1614 From: John Cowan <cowan@...>
Date: Wed Mar 2, 2011 6:45 pm
Subject: Re: Re: JSON and the Unicode Standard
johnwcowan
Send Email Send Email
 
Dave Gamble scripsit:

> Looks like the most interesting attacks on json, from a security
> viewpoint, would be using keywords as object member names.
> Has anyone checked what happens if you do? I suspect the javascript
> implementations would be the most at risk.

Indeed they would, which is precisely why {foo: "bar"} is not conformant
JSON any more than {if: "bar"} would be, although the first is conformant
JavaScript and the second is not.  However, {"foo": "bar"} and {"if":
"bar"} are both good JSON and good JavaScript, and in JavaScript {foo:
"bar"} and {"foo": "bar"} mean the same thing.

--
John Cowan  cowan@...   http://ccil.org/~cowan
Promises become binding when there is a meeting of the minds and consideration
is exchanged. So it was at King's Bench in common law England; so it was
under the common law in the American colonies; so it was through more than
two centuries of jurisprudence in this country; and so it is today.
        --Specht v. Netscape

#1615 From: John Cowan <cowan@...>
Date: Wed Mar 2, 2011 7:20 pm
Subject: Re: Re: JSON and the Unicode Standard
johnwcowan
Send Email Send Email
 
Dave Gamble scripsit:

> In other words, my question didn't concern the JSON spec, it concerned
> the limitations imposed on member variable names in javascript; since
> this seemed like it might be a sensible set of limitations to apply to
> JSON keys.

I don't think so.  In particular, it is often helpful to allow keys named
"$" or "#foo" or such.  In any case, the normalization rule for JavaScript
identifiers is "Don't".

--
There is no real going back.  Though I          John Cowan
may come to the Shire, it will not seem         cowan@...
the same; for I shall not be the same.          http://www.ccil.org/~cowan
I am wounded with knife, sting, and tooth,
and a long burden.  Where shall I find rest?           --Frodo

#1616 From: John Cowan <cowan@...>
Date: Wed Mar 2, 2011 7:27 pm
Subject: Re: Re: JSON and the Unicode Standard
johnwcowan
Send Email Send Email
 
johne_ganz scripsit:

> "What does it mean for two keys (or names in RFC 4627 nomenclature)
> to compare equal?"

Actually, it doesn't mean anything, because the RFC only says that the
keys of an object SHOULD (not MUST) be unique.  In practice, people
probably parse objects into some kind of string-based hash table,
so all duplicates but the last (or possibly the first) are ignored,
and nothing is done about normalization.

--
John Cowan    http://www.ccil.org/~cowan   <cowan@...>
     "Any legal document draws most of its meaning from context.  A telegram
     that says 'SELL HUNDRED THOUSAND SHARES IBM SHORT' (only 190 bits in
     5-bit Baudot code plus appropriate headers) is as good a legal document
     as any, even sans digital signature." --me

#1617 From: John Cowan <cowan@...>
Date: Wed Mar 2, 2011 7:29 pm
Subject: Re: Re: JSON and the Unicode Standard
johnwcowan
Send Email Send Email
 
johne_ganz scripsit:

> And just in case things get mangled along the way, the first string is
> "\u00c4" and the second string is "\u0041\u0308".  In fact, if they
> do get mangled.... I think that should serve as a warning that these
> things can and do happen behind your back when dealing with Unicode.

Yup, it got here in Normalization Form C, using U+00C4 in both cases.
But that's email for you, and perhaps that's Apple for you too.  I'd stick
with isEqual.

--
[W]hen I wrote it I was more than a little              John Cowan
febrile with foodpoisoning from an antique carrot       cowan@...
that I foolishly ate out of an illjudged faith          http://ccil.org/~cowan
in the benignancy of vegetables.  --And Rosta

#1618 From: "cmbgoud" <cmbgoud@...>
Date: Thu Mar 3, 2011 1:50 pm
Subject: XSD as JSON
cmbgoud
Send Email Send Email
 
Is there any way to represent XSD as JSON (with the constraints intact).

#1619 From: "johne_ganz" <john.engelhart@...>
Date: Fri Mar 4, 2011 12:18 am
Subject: Re: JSON and the Unicode Standard
johne_ganz
Send Email Send Email
 
--- In json@yahoogroups.com, Dave Gamble <davegamble@...> wrote:
>
> Would it be too much to specify that key names are to be ASCII top-bit-unset
> strings?
>
> i.e. in the definition of an object, designate that the "string" there is a
> "simplestring" which uses a restricted definition of char?
>
> As far as I can see, this is the only case where the Unicode interpretation
> is potentially dangerous.
> In usage of strings as data, I believe they are to be delivered unprocessed
> to the user of the data.

Restricting keys to < U+0080 (i.e., ASCII) is one way.  Personally, I'm kinda
partial to something along the lines of this:

A JSON generator SHOULD only emit key names that are (some word smithed language
along the lines of possibly NFC, precomposed, possibly even NFKC, etc...)

A JSON generator SHOULD NOT emit keys that are (Unicode Equivalent)
(http://en.wikipedia.org/wiki/Unicode_equivalence)  (... or some word smithed
language to this effect.)

A user/application MUST NOT depend on behavior that requires two (Unicode
Equivalent) keys that are not (word smithed language for concept of 'bit
identical').

The behavior is UNDEFINED for two keys that are (Unicode Equivalent) but not
(word smithed language for concept of 'bit identical').

Two keys compared to each other are equal if they are (some word smithed
language that incorporates the concept of Unicode Equivalence).  A JSON
implementation MAY perform normalization on parsed keys, but is not required to.
A JSON parser implementation MAY treat two keys that are (Unicode Equivalent)
but not (word smithed language for concept of 'bit identical') as different, but
a parser SHOULD strive for (just Unicode Equivalent).

When "round-tripping", a user/application MUST NOT depend on behavior where two
(Unicode Equivalent) keys that are not (word smithed language for concept of
'bit identical') to remain unchanged.  (the intent is that a parser may perform
some form of normalization on the keys, so when they are round tripped, an
unnormalized key may become normalized in the process).

In other words..... this is probably fairly close to exactly how things really
are right now, it just spells it out.  It also places the responsibility for the
"problem" squarely on the user/application that's generating the keys- it should
only generate, use, and manipulate keys that respect the fact that some unicode
strings may be slightly modified from their original form, but still considered
equal.

And most of these issues can be avoided if you stick to just plain ASCII code
points, or use code points that can not be mutilated by Unicode (i.e., don't use
Å, or U+212B, which can be transformed in to either U+00C5 or U+0041 U+030A).

#1620 From: "johne_ganz" <john.engelhart@...>
Date: Fri Mar 4, 2011 12:44 am
Subject: Re: JSON and the Unicode Standard
johne_ganz
Send Email Send Email
 
--- In json@yahoogroups.com, Dave Gamble <davegamble@...> wrote:
>
> To save people looking it up:
>
> ECMA-262, section 7.6:
>
> Two IdentifierName that are canonically equivalent according to the
> Unicode standard are not equal unless they are represented by the
> exact same sequence of code units (in other words, conforming
> ECMAScript implementations are only required to do bitwise comparison
> on IdentifierName values). The intent is that the incoming source text
> has been converted to normalised form C before it reaches the
> compiler.
>
> ECMAScript implementations may recognize identifier characters defined
> in later editions of the Unicode Standard. If portability is a
> concern, programmers should only employ identifier characters defined
> in Unicode 3.0.

There is another relevant section (ECMA-262, 8.4 The String Type, pg 28)

When a String contains actual textual data, each element is considered to be a
single UTF-16 code unit. Whether or not this is the actual storage format of a
String, the characters within a String are numbered by their initial code unit
element position as though they were represented using UTF-16. All operations on
Strings (except as otherwise stated) treat them as sequences of undifferentiated
16-bit unsigned integers; they do not ensure the resulting String is in
normalised form, nor do they ensure language-sensitive results.

NOTE The rationale behind this design was to keep the implementation of Strings
as simple and high-performing as possible. The intent is that textual data
coming into the execution environment from outside (e.g., user input, text read
from a file or received over the network, etc.) be converted to Unicode
Normalised Form C before the running program sees it. Usually this would occur
at the same time incoming text is converted from its original character encoding
to Unicode (and would impose no additional overhead). Since it is recommended
that ECMAScript source code be in Normalised Form C, string literals are
guaranteed to be normalised (if source text is guaranteed to be normalised), as
long as they do not contain any Unicode escape sequences.

> I think it's fairly clear that a JSON parser has ABSOLUTELY NO
> BUSINESS poking around with actual data strings; Douglas has been very
> clear that you are to pass them bit-identical to the recipient. On the
> other hand, there's an argument for some kind of sanitation when it
> comes to object member names.
> I'm really tempted by the idea of a JSON-secure spec, which clamps
> down on these details.

I disagree with your first statement.  The ECMA-262 standard, at least in my
opinion, tries to side step a lot of these issues.  It makes a fairly clear
distinction between "what happens inside the ECMA-262 environment (which it
obviously has near total control over)" and "what happens outside the ECMA-262
environment".

IMHO, the ECMA-262 standard advocates that "stuff that happens outside the
ECMA-262 environment should be treated as if it is NFC".

Since the sine qua non of JSON is the interchange of information between
different environments and implementations, it must address any issues that can
and will cause difficulties.  Like it or not, the fact that it's Unicode means
these things can and will happen, and it's simply not practical to expect or
insist that every implementation treat JSON Strings as "just a simple array of
Unicode Code Points".

> Arguing the Unicode details is decidedly NOT compatible with the
> "spirit" of JSON, which Douglas has been very clear about; a
> lightweight, simple, modern data representation.

I completely agree that these details are NOT compatible with the "spirit" of
JSON.

But.... so what?  Unicode is not simple.  I'm not the one who made it that way,
but the way that RFC 4627 is written, you must deal with it.  There are ways RFC
4627 could have been written such that the JSON to be parsed is considered a
stream of 8 bit bytes, and therefore stripped of its Unicode semantics (if any).
However, it very clearly and plainly says "JSON text SHALL be encoded in
Unicode.", which pretty much kills the idea that you can just treat it as raw
bytes.

There's a saying about formalized standards:  The standard is right.  Even it's
mistakes.

As an aside, there is a RFC for "Unicode Format for Network Interchange", RFC
5198 (http://tools.ietf.org/html/rfc5198).  It is 18 pages long.  RFC 4627 is
just 9 pages.

Actually, I would encourage people to read RFC 5198.  I'm not sure I agree with
all of it, but it goes over a lot of the issues I think are very relevant to
this conversation.  It's great background info if you're not familiar with the
details.

#1621 From: "toddkingham" <toddkingham@...>
Date: Fri Mar 11, 2011 10:16 pm
Subject: Do numeric values need to be double quoted when returned in objects?
toddkingham
Send Email Send Email
 
Just as the subject suggests... Do you need to double quote numeric values to be
valid JSON? JSONlint seems to validate the following string just fine: 
{"age":35}

I believe the part in the spec that says all STRINGS must be double quoted
doesn't apply to numeric values as they are not strings...yes?


Thanks

#1622 From: John Cowan <cowan@...>
Date: Fri Mar 11, 2011 10:18 pm
Subject: Re: Do numeric values need to be double quoted when returned in objects?
johnwcowan
Send Email Send Email
 
toddkingham scripsit:
> Just as the subject suggests... Do you need to double quote numeric
> values to be valid JSON? JSONlint seems to validate the following
> string just fine:  {"age":35}

So it should: strings are quoted, numbers are not.

> I believe the part in the spec that says all STRINGS must be double
> quoted doesn't apply to numeric values as they are not strings...yes?

Right.

--
John Cowan  cowan@...  http://ccil.org/~cowan
In computer science, we stand on each other's feet.
         --Brian K. Reid

#1623 From: "toddkingham" <toddkingham@...>
Date: Fri Mar 11, 2011 10:20 pm
Subject: Re: Do numeric values need to be double quoted when returned in objects?
toddkingham
Send Email Send Email
 
Thanks.... that was fast :)

--- In json@yahoogroups.com, John Cowan <cowan@...> wrote:
>
> toddkingham scripsit:
> > Just as the subject suggests... Do you need to double quote numeric
> > values to be valid JSON? JSONlint seems to validate the following
> > string just fine:  {"age":35}
>
> So it should: strings are quoted, numbers are not.
>
> > I believe the part in the spec that says all STRINGS must be double
> > quoted doesn't apply to numeric values as they are not strings...yes?
>
> Right.
>
> --
> John Cowan  cowan@...  http://ccil.org/~cowan
> In computer science, we stand on each other's feet.
>         --Brian K. Reid
>

#1624 From: jonathan wallace <ninja9578@...>
Date: Fri Mar 11, 2011 10:57 pm
Subject: Re: Do numeric values need to be double quoted when returned in objects?
ninja9578
Send Email Send Email
 
{"age":35}

{"age":"35"}

Are both legal JSON, but very different objects.  In the first, age is the
number 35, the second is a string with characters '3' and '5'.
 
"People have always been impressed by the power of our example, not the example
of our power." - William Jefferson Clinton


________________________________
From: toddkingham <toddkingham@...>
To: json@yahoogroups.com
Sent: Friday, March 11, 2011 5:16 PM
Subject: [json] Do numeric values need to be double quoted when returned in
objects?


 
Just as the subject suggests... Do you need to double quote numeric values to be
valid JSON? JSONlint seems to validate the following string just fine: 
{"age":35}

I believe the part in the spec that says all STRINGS must be double quoted
doesn't apply to numeric values as they are not strings...yes?

Thanks







[Non-text portions of this message have been removed]

#1625 From: "gtcul" <travis.culbreth@...>
Date: Mon Mar 28, 2011 9:15 pm
Subject: JSON to Java bean
gtcul
Send Email Send Email
 
Hi All,

I'm sure this has been asked before but I cannot find it here, so here is my
question

Is there a simple way to convert/deserialize and JSON object into a Java bean? 
There is the toBean() method but that looks like it is only for primitives.  I
know you can convert a Java bean to a JSON object.

GSON has something I could use but I don't want to use GSON.

Any help or thoughts would be great.

Travis...

#1626 From: Tatu Saloranta <tsaloranta@...>
Date: Mon Mar 28, 2011 9:21 pm
Subject: Re: JSON to Java bean
cowtowncoder
Send Email Send Email
 
On Mon, Mar 28, 2011 at 2:15 PM, gtcul
<travis.culbreth@...> wrote:
> Hi All,
>
> I'm sure this has been asked before but I cannot find it here, so here is my
question
>
> Is there a simple way to convert/deserialize and JSON object into a Java bean?
 There is the toBean() method but that looks like it is only for primitives.  I
know you can convert a Java bean to a JSON object.
>
> GSON has something I could use but I don't want to use GSON.

You are better of using a package that does data binding, such as
Jackson or GSON.

-+ Tatu +-

#1627 From: Petri Lehtinen <petri@...>
Date: Fri Apr 1, 2011 6:03 pm
Subject: Jansson 2.0.1 released
akhern...
Send Email Send Email
 
Jansson 2.0.1 is out. This release fixes a few bugs, some of them
major and some minor.

The most important bug fixes are:

* Fix strict key checking in json_unpack() code
* Fix the return value of json_object_size()
* Fix a few segfaulters when custom memory management is used
* Make the JANSSON_VERSION_HEX constant usable

For full details, see the changelog.

Download source: http://www.digip.org/jansson/releases/jansson-2.0.1.tar.gz
View documentation: http://www.digip.org/jansson/doc/2.0/
Changelog: http://www.digip.org/jansson/doc/2.0/changes.html#version-2-0-1


What is Jansson?
----------------

Jansson is a C library for encoding, decoding and manipulating JSON data.
It features:

* Simple and intuitive API and data model
* Comprehensive documentation
* No dependencies on other libraries
* Full Unicode support (UTF-8)
* Extensive test suite

Jansson is licensed under the MIT license.

For more details, see http://www.digip.org/jansson/.


Petri Lehtinen

#1628 From: Andrea Giammarchi <andrea.giammarchi@...>
Date: Sun Apr 3, 2011 10:34 am
Subject: IE hosted objects and toJSON check
an_red...
Send Email Send Email
 
This is mainly for Mr D and it's about json2.js ( probably the sans eval as
well )

If I create an object through a VBScript class definition and this object
has a hosted toJSON public method the parser will fail, while it does not
fail with native IE8 and IE9 implementation.

The change should be straight forward, where there is a check about the
method type there should be an "unknown" as well.

In few words:

if (typeof object === "object" && (typeof object.toJSON === "function" ||
typeof object.toJSON === "unknown")) {
     value = object.toJSON(key);
}

Above code will already work as expected but to be honest I would rather
make the check more portable

function invokeToJSON(object, method, key) {
     switch (typeof object[method]) {
         case "function":
         case "unknown":
             try {
                 return typeof key === "string" ?
                     object[method]() :
                     object[method](key)
                 ;
             } catch(e) {
                 // notify something here
             }
     }
}

// so that ...

if (typeof object === "object") {
     value = invokeToJSON(object, "toJSON", key);
}

// then if value is not undefined ...

the function could be reused for toString and valueOf, if necessary, without
a key argument since VBScript methods are sensible to arguments length.

Any comment/improvement will be appreciated, any update on github more than
welcome.

Best Regards,
     Andrea Giammarchi

P.S. "unknown" is IE specific


[Non-text portions of this message have been removed]

#1629 From: "Douglas Crockford" <douglas@...>
Date: Sun Apr 3, 2011 1:16 pm
Subject: Re: IE hosted objects and toJSON check
douglascrock...
Send Email Send Email
 
--- In json@yahoogroups.com, Andrea Giammarchi <andrea.giammarchi@...> wrote:
>
> This is mainly for Mr D and it's about json2.js ( probably the sans eval as
> well )
>
> If I create an object through a VBScript class definition and this object
> has a hosted toJSON public method the parser will fail, while it does not
> fail with native IE8 and IE9 implementation.
>
> The change should be straight forward, where there is a check about the
> method type there should be an "unknown" as well.
>
> In few words:
>
> if (typeof object === "object" && (typeof object.toJSON === "function" ||
> typeof object.toJSON === "unknown")) {
>     value = object.toJSON(key);
> }
>
> Above code will already work as expected but to be honest I would rather
> make the check more portable
>
> function invokeToJSON(object, method, key) {
>     switch (typeof object[method]) {
>         case "function":
>         case "unknown":
>             try {
>                 return typeof key === "string" ?
>                     object[method]() :
>                     object[method](key)
>                 ;
>             } catch(e) {
>                 // notify something here
>             }
>     }
> }
>
> // so that ...
>
> if (typeof object === "object") {
>     value = invokeToJSON(object, "toJSON", key);
> }
>
> // then if value is not undefined ...
>
> the function could be reused for toString and valueOf, if necessary, without
> a key argument since VBScript methods are sensible to arguments length.
>
> Any comment/improvement will be appreciated, any update on github more than
> welcome.
>
> Best Regards,
>     Andrea Giammarchi
>
> P.S. "unknown" is IE specific

I am happy that you were able to get it working, but I have to tell you, I have
absolutely no interest in supporting VBScript or typeof unknown. I have a very
strong interest in seeing that crap go extinct.

Messages 1600 - 1629 of 1968   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help