Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

json · JSON JavaScript Object Notation

The Yahoo! Groups Product Blog

Check it out!

Group Information

  • Members: 590
  • Category: Data Formats
  • Founded: Jul 19, 2005
  • Language: English
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Messages

Advanced
Messages Help
Messages 1705 - 1734 of 1958   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#1705 From: "Mark Joseph" <mark@...>
Date: Wed Sep 21, 2011 10:10 pm
Subject: Re: Is JSON suitable for embedded applications?
markjoseph_sc
Send Email Send Email
 
I don't see the point of using something new for binary data.   Clearly this is
not JSON.   If you have an application that needs an encoding that handles
binary I suggest ASN.1 encoding (which is used by LDAP and SNMP)
http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One

ASN.1 handles binary and text data, it is extensible and easy to parse.   There
are also existing software for it.



Mark Joseph, Ph.D.
President
P6R, Inc
408-205-0361
mark@...
Skype: markjoseph_sc
   _____

From: Stephan Beal [mailto:sgbeal@...]
To: json@yahoogroups.com
Sent: Wed, 21 Sep 2011 13:27:54 -0700
Subject: Re: [json] Is JSON suitable for embedded applications?






On Tue, Sep 6, 2011 at 6:30 PM, pozzugno <pozzugno@...> wrote:

   > **
   >
   > Could JSON data format be useful for me? Consider that my C compiler
   > doesn't support malloc()/free() functionalities, but only static and
   > automatic variable allocations. Is there a JSON C implementation suitable
   > for small embedded applications, without malloc()/free() and capable
   > reading/writing without storing the entire configuration in RAM?
   >
   > Any suggestion for other data format?
   >
   It sounds like binary is your best bet, but if you're willing to
   hack/experiment a little...

   i have a portable C89/C99 json library (
   http://whiki.wanderinghorse.net/wikis/cson/) which, with only minor changes,
   could use a custom allocator. i also just happen to have a custom allocator
   (http://fossil.wanderinghorse.net/repos/whalloc/index.cgi/wiki/whalloc_bt)
   which
   allows clients to give it a chunk of static (or otherwise-allocated) memory
   for it to malloc/realloc/free from (it was designed for _small_ apps which
   cannot/do not want to call malloc()). With that combination (again, with a
   small amount of hacking), if you know the approximate memory requirements in
   advance and you can spare it, i think it would work. You could also test it
   without your embedded device, and tweak/optimize the memory parameters, by
   using the same static memory block size on a dev environment. If you're
   interested in trying that out, send me your rough memory limits (off list:
   sgbeal googlemail com) and i'll see if i can throw it together. i can't
   guaranty it would work without a single malloc(), but (A) i think it could
   and (B) it might be fun to try. cson is fairly heavily optimized for a
   minimal number of calls to malloc(), that being an explicit design goal.

   i would also recommend looking at the jansson C library, but if i'm not
   sorely mistaken, cson is better optimized for low memory consumption.
   jansson supports, e.g., handling of cyclic structures, and that inherently
   adds memory costs. jansson also allows mutation of values after creating
   them, which (if i'm not mistaken) implicitly precludes certain pedantic
   malloc() reduction optimizations which cson does (i had to remove the
   mutator functions from the public API in order to be able to make some of
   those optimizations).

   --
   ----- stephan beal
   http://wanderinghorse.net/home/stephan/

   [Non-text portions of this message have been removed]




[Non-text portions of this message have been removed]

#1706 From: Don Owens <don@...>
Date: Wed Sep 21, 2011 11:57 pm
Subject: Re: Re: Universal Binary JSON Specification
regexman
Send Email Send Email
 
I've seen very large numbers used in JSON.  In Perl, that can be represented
as a Math::BigInt object.  And that is the way I have implemented it in my
JSON module for Perl (JSON::DWIW).  Python has arbitrary length integers
built-in.  For my own language that I'm working on, I'm using libgmp in C to
handle arbitrary length integers.

JSON is used as a data exchange format.  I want to be able to do a
roundtrip, e.g., Python -> encoded -> Python with native integers (with
arbitrary length in this case).  In JSON, this just works, as far as the
encoding is concerned.  I see the need for this in any binary JSON format as
well.  If a large number is represented as a string, then on the decoding
side, you don't know if that was a number or a string (just because it looks
like a number doesn't mean that the sender means it's a number).  If, when
decoding JSON, the library can't handle large numbers, it has to throw an
error anyway.  The same should go for binary JSON.

./don


On Wed, Sep 21, 2011 at 11:58 AM, rkalla123 <rkalla@...> wrote:

> **
>
>
> Don,
> Interesting point. Stephan and I had a discussion this morning about the
> portability of the numeric types across most (all) platforms breaking
> down when it comes to 64-bit integers and JavaScript/C89. Going beyond
> that to add an arbitrary length number seems to me like the same
> concerns apply, only more so.
> I think one of the beauties of JSON was that every construct presented
> in the spec could be modeled in every language easily and immediately. I
> have very little familiarity with arbitrarily long integer and decimal
> numbers, but can the same be said for them and all the languages making
> heavy use of JSON?
> I think as a workaround to this, people utilizing the binary spec and
> working with > 64-bit numbers would be able to store and transfer the
> values as Strings. Maybe not optimal, but me expanding the spec without
> a clearer understanding of the goals and compatibility implications
> would be worse.
> Thank you for surfacing this issue.
>
> --- In json@yahoogroups.com, Don Owens <don@...> wrote:
> >
> > What happens if you need to encode an integer larger than 64-bits?
> > Shouldn't there be a way to encode a large integer as a buffer with a
> byte
> > length? I don't think the JSON spec puts a limit on the size of an
> integer.
> > If there is no way to encode large integers in your spec, people will
> > invent their own way of doing it -- it's better to have a standard way
> of
> > doing it. I think you should just have another type, e.g., bigint,
> that is
> > something like a byte count (the way string has) and a set of bytes in
> > big-endian order representing the integer.
> >
> > ./don
> >
> > On Wed, Sep 21, 2011 at 6:58 AM, rkalla123 rkalla@... wrote:
> >
> > > **
> > >
> > >
> > >
> > >
> > > Stephan,
> > >
> > > Great feedback, these are exactly the kind of nuances I wanted to
> uncover
> > > and discuss.
> > >
> > > When I dug through the language specs looking for the most
> well-supported
> > > numeric types before settling on the given 4, it seemed of all the
> ones I
> > > checked JavaScript was the only one that didn't have native int64
> support as
> > > you mentioned.
> > >
> > > The Chrome team suggests treating it like two int32s:
> > > http://code.google.com/p/v8/issues/detail?id=1339
> > >
> > > I found no way to deserialize JavaScript objects directly from a
> > > server-provided byte stream which I anticipate meaning that this
> binary
> > > format doesn't benefit users in the server-to-Browser communication
> workflow
> > > (which is already highly optimized for text-based JSON) but as a
> general
> > > application binary interchange format. Like values stored as files
> on the
> > > server or processes communicating with one another.
> > >
> > > It was my thinking that this leaves the int64 inclusion in the
> binary spec
> > > in a relatively safe position as the primary use case will be
> between
> > > languages that support 64-bit ints.
> > >
> > > Said another way, I certainly see your point, but I am trying to
> avoid a
> > > "cut off nose despite your face" situation with forward-thinking
> > > enhancements to JavaScript engines in the next few years that I
> assume will
> > > eventually give us 64-bit integers as the JS spec advances.
> > >
> > > I would hate then to be prompted to add 64-bit integers back to the
> binary
> > > spec after it has been out and in circulation for a few years.
> > >
> > > ASIDE: If I am missing a native way to reconstitute JavaScript
> objects from
> > > a server-provided byte stream and using this binary format for
> > > server-browser communication is an optimized reality, please correct
> me. I
> > > was unable to dig up methods to do this.
> > >
> > > --- In json@yahoogroups.com, Stephan Beal sgbeal@ wrote:
> > >
> > > >
> > > > On Wed, Sep 21, 2011 at 3:15 PM, rkalla123 rkalla@ wrote:
> > > >
> > > > > **
> > > > >
> > > > > The only difference from JSON being that "Number" is broken out
> into:
> > > > > int32, int64 and double types for the purposes of making parsing
> of the
> > > > >
> > > >
> > > > Keep in mind that JSON does not specify any required numeric
> precision,
> > > and
> > > > some platforms cannot use int64. (JavaScript specifies 53 bits of
> > > precision,
> > > > in case it matters.) e.g. in C89 there is no _portable_ int64
> construct
> > > > (that was introduced with C99, but lots of projects still
> use/require C89
> > > > because of the very different levels of C99 support in various
> > > compilers). i
> > > > know that Java is everyone's special baby, but some of us actually
> write
> > > > JSON-consuming/producing C89 code. In the world of C++, Google's
> v8
> > > > JavaScript engine doesn't support 64-bit integers: numbers >32
> bits need
> > > to
> > > > be doubles on that platform.
> > > >
> > > > In any case, i currently have a use case which will eventually
> require
> > > some
> > > > type of binary support, and i will be reading through what you've
> posted.
> > > > Thanks for sharing :).
> > > >
> > > > Happy Hacking!
> > > >
> > > >
> > > > --
> > > > ----- stephan beal
> > > > http://wanderinghorse.net/home/stephan/
> > > >
> > > >
> > > > [Non-text portions of this message have been removed]
> > > >
> > >
> > >
> > >
> >
> >
> >
> > --
> > Don Owens
> > don@...
>
> >
> >
> > [Non-text portions of this message have been removed]
> >
>
> [Non-text portions of this message have been removed]
>
>
>



--
Don Owens
don@...


[Non-text portions of this message have been removed]

#1707 From: "rkalla123" <rkalla@...>
Date: Thu Sep 22, 2011 2:50 am
Subject: Re: Universal Binary JSON Specification
rkalla123
Send Email Send Email
 
Don,

I see your point. The way I understand it is that this would require 2 new data
types, effectively BigInt and BigDecimal.

So say something along these lines:

bigint - marker 'G'
[G][129][129 big-endian ordered bytes representing a BigInt]

bigdouble - marker 'W'
[W][222][222 big-endian ordered bytes representing a BigDecimal]


Thoughts?

--- In json@yahoogroups.com, Don Owens <don@...> wrote:
>
> I've seen very large numbers used in JSON.  In Perl, that can be represented
> as a Math::BigInt object.  And that is the way I have implemented it in my
> JSON module for Perl (JSON::DWIW).  Python has arbitrary length integers
> built-in.  For my own language that I'm working on, I'm using libgmp in C to
> handle arbitrary length integers.
>
> JSON is used as a data exchange format.  I want to be able to do a
> roundtrip, e.g., Python -> encoded -> Python with native integers (with
> arbitrary length in this case).  In JSON, this just works, as far as the
> encoding is concerned.  I see the need for this in any binary JSON format as
> well.  If a large number is represented as a string, then on the decoding
> side, you don't know if that was a number or a string (just because it looks
> like a number doesn't mean that the sender means it's a number).  If, when
> decoding JSON, the library can't handle large numbers, it has to throw an
> error anyway.  The same should go for binary JSON.
>
> ./don

#1708 From: Tatu Saloranta <tsaloranta@...>
Date: Thu Sep 22, 2011 6:17 am
Subject: Re: Universal Binary JSON Specification
cowtowncoder
Send Email Send Email
 
On Wed, Sep 21, 2011 at 10:44 AM, Stephan Beal <sgbeal@...> wrote:
> On Wed, Sep 21, 2011 at 6:45 PM, Tatu Saloranta <tsaloranta@...>wrote:
>
>> **
>> You might be interested in an existing such specification called
>> Smile: http://wiki.fasterxml.com/SmileFormatSpec
>> which was specified about a year ago, has Java and C implementations,
>> and used by a few projects/products like ElasticSearch.
>>
>
> Thank you for that. Smile's requirement that impls be capable of supporting
> "shared strings" seems a bit draconian to me, though. That adds non-trivial
> parser/writer infrastructure which would otherwise not be required
> (especially in C, which doesn't have standard containers we can use to store
> such strings/references in).

Correct, format does not aim for minimal complexity of implementations.
But for space efficiency it is pretty much a requirement as small set
of names is typically reused over and over again for data streams, and
back reference use can reduce size significantly.
This is an optional feature for encoder for what it is worth.

-+ Tatu +-

#1709 From: Tatu Saloranta <tsaloranta@...>
Date: Thu Sep 22, 2011 6:20 am
Subject: Re: Re: Universal Binary JSON Specification
cowtowncoder
Send Email Send Email
 
On Wed, Sep 21, 2011 at 7:50 PM, rkalla123 <rkalla@...> wrote:
> Don,
>
> I see your point. The way I understand it is that this would require 2 new
data types, effectively BigInt and BigDecimal.
>
> So say something along these lines:
>
> bigint - marker 'G'
> [G][129][129 big-endian ordered bytes representing a BigInt]
>
> bigdouble - marker 'W'
> [W][222][222 big-endian ordered bytes representing a BigDecimal]
>
>
> Thoughts?

Yes, to properly support full JSON data set, one should provide
BigInteger/-Decimal either binary representations or by embedding
textual representation.

In practice I doubt it is needed all that often; BSON for example does
not support such types (unless I misread
[http://groups.google.com/group/bson/browse_thread/thread/d298c0ab50b01e4?pli=1]\
)

-+ Tatu +-

#1710 From: Stephan Beal <sgbeal@...>
Date: Thu Sep 22, 2011 7:09 am
Subject: Re: Re: Universal Binary JSON Specification
stephan.beal
Send Email Send Email
 
On Thu, Sep 22, 2011 at 4:50 AM, rkalla123 <rkalla@...> wrote:

> **
>
> bigdouble - marker 'W'
> [W][222][222 big-endian ordered bytes representing a BigDecimal]
>
> Thoughts?
>
Don's point is valid but it assumes that every environment has this support,
and that's not the case. Maybe his use cases/environments have that. When
writing generic code, however, "big numbers" don't exist.

--
----- stephan beal
http://wanderinghorse.net/home/stephan/


[Non-text portions of this message have been removed]

#1711 From: Stephan Beal <sgbeal@...>
Date: Thu Sep 22, 2011 7:11 am
Subject: Re: Universal Binary JSON Specification
stephan.beal
Send Email Send Email
 
On Thu, Sep 22, 2011 at 8:17 AM, Tatu Saloranta <tsaloranta@...>wrote:

> **
> On Wed, Sep 21, 2011 at 10:44 AM, Stephan Beal <sgbeal@...>
> wrote:
> > Thank you for that. Smile's requirement that impls be capable of
> supporting
> > "shared strings" seems a bit draconian to me, though. That adds
> non-trivial
> ...
>

Correct, format does not aim for minimal complexity of implementations.
> ...This is an optional feature for encoder for what it is worth.
>

But for the decoder it's required, or at least that's how i understood the
docs.

--
----- stephan beal
http://wanderinghorse.net/home/stephan/


[Non-text portions of this message have been removed]

#1712 From: Stephan Beal <sgbeal@...>
Date: Thu Sep 22, 2011 7:12 am
Subject: Re: Re: Universal Binary JSON Specification
stephan.beal
Send Email Send Email
 
On Thu, Sep 22, 2011 at 8:20 AM, Tatu Saloranta <tsaloranta@...>wrote:

> **
> Yes, to properly support full JSON data set, one should provide
> BigInteger/-Decimal...
>

i don't agree: JSON does not specify an integer size, which means that
supporting only an 8-bit int is still valid JSON.

--
----- stephan beal
http://wanderinghorse.net/home/stephan/


[Non-text portions of this message have been removed]

#1713 From: "rkalla123" <rkalla@...>
Date: Thu Sep 22, 2011 1:43 pm
Subject: Re: Universal Binary JSON Specification
rkalla123
Send Email Send Email
 
Stephan,

It reminds me of our conversation earlier about 64-bit. As you mentioned, Don
has a great point, but the uniqueness of the data structure (I doubt the
majority of people using JSON would use it) combined with my gut telling me to
wait, I think I am going to specify BigInt/BigDecimal support in the official
specification under a new section "Pending" and wait for feedback from more
people.

I am working hard to keep the spec so conceptually simple that it just fits in a
tiny brain-pocket any time someone reads it and they feel empowered immediately
to start getting work done.

This might mean at the beginning some fringe items not being addressed, but I'd
rather add them under strong demand later, then add them now and make those
extra 10% of features suddenly make the format seem JUST complex enough that
someone skimming it, hoping for something simple to use starts to glaze their
eyes over and not be interested anymore.

I really appreciate this dialog on the subject guys, it helps get all the
aspects out on the table early!

--- In json@yahoogroups.com, Stephan Beal <sgbeal@...> wrote:
>
> On Thu, Sep 22, 2011 at 4:50 AM, rkalla123 <rkalla@...> wrote:
>
> > **
> >
> > bigdouble - marker 'W'
> > [W][222][222 big-endian ordered bytes representing a BigDecimal]
> >
> > Thoughts?
> >
> Don's point is valid but it assumes that every environment has this support,
> and that's not the case. Maybe his use cases/environments have that. When
> writing generic code, however, "big numbers" don't exist.
>
> --
> ----- stephan beal
> http://wanderinghorse.net/home/stephan/
>
>
> [Non-text portions of this message have been removed]
>

#1714 From: Don Owens <don@...>
Date: Thu Sep 22, 2011 2:15 pm
Subject: Re: Re: Universal Binary JSON Specification
regexman
Send Email Send Email
 
Yes, that is what I was getting at.  But see comments embedded.

On Wed, Sep 21, 2011 at 7:50 PM, rkalla123 <rkalla@...> wrote:

> **
>
>
> Don,
>
> I see your point. The way I understand it is that this would require 2 new
> data types, effectively BigInt and BigDecimal.
>
> So say something along these lines:
>
> bigint - marker 'G'
> [G][129][129 big-endian ordered bytes representing a BigInt]
>
> It should be mentioned that they are signed ints, but doing two's
complement and such is probably too much work.  Maybe just specify that the
first bit always represents the sign (0 for no sign, 1 or minus).


> bigdouble - marker 'W'
> [W][222][222 big-endian ordered bytes representing a BigDecimal]
>
>
BigDecimal should probably be renamed to something like BigFloat, since
decimal is ambiguous (used to mean base-10 and floating point).  I'm less
familiar with large floating point, but I think a floating point number
should consist of a sign bit plus two integers (one for the
mantissa/significand and one for the exponent).  In the interest of space
savings, I think the sign bit should just be included in the exponent and
order things so they look similar to the IEEE 754 spec, e.g.,

[W][3][3 big-endian ordered bytes (where first bit is sign bit) of
exponent][222][222 big-endian ordered bytes of mantissa]



> Thoughts?
>

In terms of the documentation, I think the big integers and floats should be
qualified with a "should implement" instead of a "must implement", since, as
others have mentioned, not every encoder and decoder will be able to handle
these.  I think this matches JSON implementations well.  If an encoder does
not handle large numbers, it could just throw an error, just as it should
throw an error now if an oversized number is encountered in JSON.  The same
goes for the decoder side.  If there is no good way to represent a large
number in the language your are working in, throw an error indicating that
the number is too large.

Have you looked into using variable-length integers for length specifiers?
  If you have a lot of short strings (or big numbers, etc.) in your data,
these could significantly reduce your space usage (at the cost of more
complexity for the developer and CPU).  There should be a balance between
space efficiency and complexity.  Thoughts?



>
>

> --- In json@yahoogroups.com, Don Owens <don@...> wrote:
> >
> > I've seen very large numbers used in JSON. In Perl, that can be
> represented
> > as a Math::BigInt object. And that is the way I have implemented it in my
> > JSON module for Perl (JSON::DWIW). Python has arbitrary length integers
> > built-in. For my own language that I'm working on, I'm using libgmp in C
> to
> > handle arbitrary length integers.
> >
> > JSON is used as a data exchange format. I want to be able to do a
> > roundtrip, e.g., Python -> encoded -> Python with native integers (with
> > arbitrary length in this case). In JSON, this just works, as far as the
> > encoding is concerned. I see the need for this in any binary JSON format
> as
> > well. If a large number is represented as a string, then on the decoding
> > side, you don't know if that was a number or a string (just because it
> looks
> > like a number doesn't mean that the sender means it's a number). If, when
> > decoding JSON, the library can't handle large numbers, it has to throw an
> > error anyway. The same should go for binary JSON.
> >
> > ./don
>
>
>



--
Don Owens
don@...


[Non-text portions of this message have been removed]

#1715 From: Don Owens <don@...>
Date: Thu Sep 22, 2011 2:21 pm
Subject: Re: Re: Universal Binary JSON Specification
regexman
Send Email Send Email
 
I didn't mean to imply that every environment has this support -- I'm very
aware that most environments do not.  However, the same issue arises when
using JSON.  If you encounter a number that is too large to fit in your
available integer/float sizes, you should return an error (and you should
definitely check this).  It should be the same in the case of binary JSON --
if the environment can't handle numbers of that size, return an error.

My concern is that I shouldn't have to switch formats or hack a format in
order to do round trips with data that should fit in the "golden data
structure".

./don

On Thu, Sep 22, 2011 at 12:09 AM, Stephan Beal <sgbeal@...>wrote:

> **
>
>
> On Thu, Sep 22, 2011 at 4:50 AM, rkalla123 <rkalla@...> wrote:
>
> > **
>
> >
> > bigdouble - marker 'W'
> > [W][222][222 big-endian ordered bytes representing a BigDecimal]
> >
> > Thoughts?
> >
> Don's point is valid but it assumes that every environment has this
> support,
> and that's not the case. Maybe his use cases/environments have that. When
> writing generic code, however, "big numbers" don't exist.
>
> --
> ----- stephan beal
> http://wanderinghorse.net/home/stephan/
>
> [Non-text portions of this message have been removed]
>
>
>



--
Don Owens
don@...


[Non-text portions of this message have been removed]

#1716 From: Don Owens <don@...>
Date: Thu Sep 22, 2011 2:33 pm
Subject: Re: Re: Universal Binary JSON Specification
regexman
Send Email Send Email
 
I forgot to add that encoders should only use the big number format if the
number is too big to fit in int64 (or int32, depending on which will be the
largest in the spec) or a double.  That way, if a decoder can't handle a
number larger than int64 anyway, it does not need to implement decoding of
big numbers -- you don't want a number that will fit in an int32 put into a
big number format anyway.


On Thu, Sep 22, 2011 at 7:15 AM, Don Owens <don@...> wrote:

> Yes, that is what I was getting at.  But see comments embedded.
>
> On Wed, Sep 21, 2011 at 7:50 PM, rkalla123 <rkalla@...> wrote:
>
>> **
>>
>>
>> Don,
>>
>> I see your point. The way I understand it is that this would require 2 new
>> data types, effectively BigInt and BigDecimal.
>>
>> So say something along these lines:
>>
>> bigint - marker 'G'
>> [G][129][129 big-endian ordered bytes representing a BigInt]
>>
>> It should be mentioned that they are signed ints, but doing two's
> complement and such is probably too much work.  Maybe just specify that the
> first bit always represents the sign (0 for no sign, 1 or minus).
>
>
>> bigdouble - marker 'W'
>> [W][222][222 big-endian ordered bytes representing a BigDecimal]
>>
>>
> BigDecimal should probably be renamed to something like BigFloat, since
> decimal is ambiguous (used to mean base-10 and floating point).  I'm less
> familiar with large floating point, but I think a floating point number
> should consist of a sign bit plus two integers (one for the
> mantissa/significand and one for the exponent).  In the interest of space
> savings, I think the sign bit should just be included in the exponent and
> order things so they look similar to the IEEE 754 spec, e.g.,
>
> [W][3][3 big-endian ordered bytes (where first bit is sign bit) of
> exponent][222][222 big-endian ordered bytes of mantissa]
>
>
>
>>  Thoughts?
>>
>
> In terms of the documentation, I think the big integers and floats should
> be qualified with a "should implement" instead of a "must implement", since,
> as others have mentioned, not every encoder and decoder will be able to
> handle these.  I think this matches JSON implementations well.  If an
> encoder does not handle large numbers, it could just throw an error, just as
> it should throw an error now if an oversized number is encountered in JSON.
>  The same goes for the decoder side.  If there is no good way to represent a
> large number in the language your are working in, throw an error indicating
> that the number is too large.
>
> Have you looked into using variable-length integers for length specifiers?
>  If you have a lot of short strings (or big numbers, etc.) in your data,
> these could significantly reduce your space usage (at the cost of more
> complexity for the developer and CPU).  There should be a balance between
> space efficiency and complexity.  Thoughts?
>
>
>
>>
>>
>
>> --- In json@yahoogroups.com, Don Owens <don@...> wrote:
>> >
>> > I've seen very large numbers used in JSON. In Perl, that can be
>> represented
>> > as a Math::BigInt object. And that is the way I have implemented it in
>> my
>> > JSON module for Perl (JSON::DWIW). Python has arbitrary length integers
>> > built-in. For my own language that I'm working on, I'm using libgmp in C
>> to
>> > handle arbitrary length integers.
>> >
>> > JSON is used as a data exchange format. I want to be able to do a
>> > roundtrip, e.g., Python -> encoded -> Python with native integers (with
>> > arbitrary length in this case). In JSON, this just works, as far as the
>> > encoding is concerned. I see the need for this in any binary JSON format
>> as
>> > well. If a large number is represented as a string, then on the decoding
>> > side, you don't know if that was a number or a string (just because it
>> looks
>> > like a number doesn't mean that the sender means it's a number). If,
>> when
>> > decoding JSON, the library can't handle large numbers, it has to throw
>> an
>> > error anyway. The same should go for binary JSON.
>> >
>> > ./don
>>
>>
>>
>
>
>
> --
> Don Owens
> don@...
>
>


--
Don Owens
don@...


[Non-text portions of this message have been removed]

#1717 From: Raymond Reggers <raymond@...>
Date: Thu Sep 22, 2011 9:29 pm
Subject: Re: Re: Universal Binary JSON Specification
adaptivdesign
Send Email Send Email
 
Hey all,

It might be worth it, to take a peek at the AMF0 and AMF3 protocol. The
AM3 protocol makes a distinction between integer and number data. Taken
from http://osflash.org/documentation/amf3 :

/

Integer-data is probably the single most used item inAMF3. To save space
it is an integer that can be 1-4 bytes long. The first bit of the first
three bytes determine if the next byte is included (1) in this
integer-data or not (0). The last byte, if present, is read completely
(8 bits). The first bits are then removed from the first three bytes and
the remaining bits concatenated to form a big-endian integer.

The integer has a maximum of 29 bits (3*7+8) and a value range of
-268435456(int.MIN_VALUE»3) to 268435455(int.MAX_VALUE»3).

//

The integer is negative if it is the full 29 bits long and the first bit
is set (1). This usestwo's complementnotation and is therefore identical
to normal signed integer behaviour. So if you read the integer into a 32
bit integer, all you will need to do is extend the sign

//Examples:
0011 0101 = 53
1000 0001 0101 0100 = 212
1000 0110 1100 1010 0011 1111 = 107839
1111 1111 1111 1111 1111 1111 1111 1111 = -1
1100 0001 1111 1111 1111 1111 1111 1111 = -268435456
1100 0000 1000 0001 1000 0001 1000 0000 = 268435455/


On 22-9-2011 16:33, Don Owens wrote:
>
> I forgot to add that encoders should only use the big number format if the
> number is too big to fit in int64 (or int32, depending on which will
> be the
> largest in the spec) or a double. That way, if a decoder can't handle a
> number larger than int64 anyway, it does not need to implement decoding of
> big numbers -- you don't want a number that will fit in an int32 put
> into a
> big number format anyway.
>
> On Thu, Sep 22, 2011 at 7:15 AM, Don Owens <don@...
> <mailto:don%40regexguy.com>> wrote:
>
> > Yes, that is what I was getting at. But see comments embedded.
> >
> > On Wed, Sep 21, 2011 at 7:50 PM, rkalla123 <rkalla@...
> <mailto:rkalla%40gmail.com>> wrote:
> >
> >> **
> >>
> >>
> >> Don,
> >>
> >> I see your point. The way I understand it is that this would
> require 2 new
> >> data types, effectively BigInt and BigDecimal.
> >>
> >> So say something along these lines:
> >>
> >> bigint - marker 'G'
> >> [G][129][129 big-endian ordered bytes representing a BigInt]
> >>
> >> It should be mentioned that they are signed ints, but doing two's
> > complement and such is probably too much work. Maybe just specify
> that the
> > first bit always represents the sign (0 for no sign, 1 or minus).
> >
> >
> >> bigdouble - marker 'W'
> >> [W][222][222 big-endian ordered bytes representing a BigDecimal]
> >>
> >>
> > BigDecimal should probably be renamed to something like BigFloat, since
> > decimal is ambiguous (used to mean base-10 and floating point). I'm less
> > familiar with large floating point, but I think a floating point number
> > should consist of a sign bit plus two integers (one for the
> > mantissa/significand and one for the exponent). In the interest of space
> > savings, I think the sign bit should just be included in the
> exponent and
> > order things so they look similar to the IEEE 754 spec, e.g.,
> >
> > [W][3][3 big-endian ordered bytes (where first bit is sign bit) of
> > exponent][222][222 big-endian ordered bytes of mantissa]
> >
> >
> >
> >> Thoughts?
> >>
> >
> > In terms of the documentation, I think the big integers and floats
> should
> > be qualified with a "should implement" instead of a "must
> implement", since,
> > as others have mentioned, not every encoder and decoder will be able to
> > handle these. I think this matches JSON implementations well. If an
> > encoder does not handle large numbers, it could just throw an error,
> just as
> > it should throw an error now if an oversized number is encountered
> in JSON.
> > The same goes for the decoder side. If there is no good way to
> represent a
> > large number in the language your are working in, throw an error
> indicating
> > that the number is too large.
> >
> > Have you looked into using variable-length integers for length
> specifiers?
> > If you have a lot of short strings (or big numbers, etc.) in your data,
> > these could significantly reduce your space usage (at the cost of more
> > complexity for the developer and CPU). There should be a balance between
> > space efficiency and complexity. Thoughts?
> >
> >
> >
> >>
> >>
> >
> >> --- In json@yahoogroups.com <mailto:json%40yahoogroups.com>, Don
> Owens <don@...> wrote:
> >> >
> >> > I've seen very large numbers used in JSON. In Perl, that can be
> >> represented
> >> > as a Math::BigInt object. And that is the way I have implemented
> it in
> >> my
> >> > JSON module for Perl (JSON::DWIW). Python has arbitrary length
> integers
> >> > built-in. For my own language that I'm working on, I'm using
> libgmp in C
> >> to
> >> > handle arbitrary length integers.
> >> >
> >> > JSON is used as a data exchange format. I want to be able to do a
> >> > roundtrip, e.g., Python -> encoded -> Python with native integers
> (with
> >> > arbitrary length in this case). In JSON, this just works, as far
> as the
> >> > encoding is concerned. I see the need for this in any binary JSON
> format
> >> as
> >> > well. If a large number is represented as a string, then on the
> decoding
> >> > side, you don't know if that was a number or a string (just
> because it
> >> looks
> >> > like a number doesn't mean that the sender means it's a number). If,
> >> when
> >> > decoding JSON, the library can't handle large numbers, it has to
> throw
> >> an
> >> > error anyway. The same should go for binary JSON.
> >> >
> >> > ./don
> >>
> >>
> >>
> >
> >
> >
> > --
> > Don Owens
> > don@... <mailto:don%40regexguy.com>
> >
> >
>
> --
> Don Owens
> don@... <mailto:don%40regexguy.com>
>
> [Non-text portions of this message have been removed]
>
>



[Non-text portions of this message have been removed]

#1718 From: Milo Sredkov <miloslav@...>
Date: Thu Sep 22, 2011 9:47 pm
Subject: Re: Re: Universal Binary JSON Specification
milosredkov
Send Email Send Email
 
Hello Riyad, Stephan, Don, Tatu, and all group members,

I recently analysed about 70 of the libraries linked from json.org (almost
all listed in the C++, C, Java, Python, Haskell, JavaScript, Ruby, C#, PHP,
and Lisp sections) and would like to share some opinions about the presented
specification, and also about some of the topics that rose in the discussion
so far.

First, I'm really happy to see people trying to do cool things in favour of
the JSON community – a simple and efficient binary JSON representation is
for sure a cool thing from which we can all benefit. Although you will
probably need to do more work in order to  show everyone that this format is
*the one*, initiating a discussion is probably the right thing to do.

IMHO there are two things, already pointed out by the others, which I find
disturbing. First, as other binary JSON representations are already present,
you need to position the Universal Binary JSON format very clearly compared
to the alternatives, especially to the aforementioned Smile, which seems to
solve very similar goals. It's obvious that the proposed format is simple,
and that this may be its unique strength, but unless you clearly
(quantitatively) show exactly how simpler it is, people will not hurry to
adopt it. What's more, as there already exist several binary JSON formats,
failure to persuade the community that the new one is superior not only will
make your effort unsuccessful, it will also worsen things by introducing
additional discrepancy.

The second issue is about the numbers. This one is actually an issue of JSON
itself, and more specifically, the fact that JSON is specified only at the
syntax level and there lacks a commonly accepted data-model (or meta-model,
information model, etc.) specifying the set of information that can be
encoded in JSON. The JSON specification just states how numbers are encoded.
It does not state whether 10, 10.0, or 1E1 are different numbers, neither
does it say how large the numbers can be, or whether the concrete way in
which they are written is important. From this, in the libraries for the 10
programming languages I mentioned, there are huge variations in the
supported range formats. Some distinguish integers from floats, others
don't, some expose the concrete string in which the number was encoded,
others don't, and so on. Most importantly, the supported ranges vary for
each library, starting from 30bit (not 32) signed integers to unlimited
decimal numbers.

Having said that, although you are not the one who is responsible for the
situation, you should really treat the numbers very carefully. In my
opinion, thinking language neutrally, there are only 2 strategies (or data
models) that make sense. The first, which many people imply because of
JSON's origin, is to assume JavaScript semantics, that is, in this case that
numbers are 64bit IEEE 754 floating point numbers. This makes things really
simple, but is not suitable for applications where rounding errors are not
tolerable, e.g. storing monetary values. The second approach is to assume
(unlimited) decimal numbers – tools are free to have their limitations, but
any real number that can be encoded as a finite decimal fraction is
supported by the specification, and tools try their best to deliver it
without any loss of precision. This approach makes most sense to me – it
allows JSON to be used for a large number of applications. However, it
contrasts to the JSON's idea of being the intersection of the modern
programming languages, not the union. Adopting it means the following:
* There is no reason for having big integers at the format level, only
decimal numbers should be enough (1.0 == 1 == 1.00e0)
* The semantics and precision guarantees of the "double" encoding should be
very carefully and strictly defined. Keep in mind, that even simple decimal
values like 1.7 cannot be expressed exactly in binary IEEE 754 floats
* +0, -0, 0.0e0, are the same value, and according to the rule of picking
the smallest suitable type, they should be encoded as "byte"
  * "BigDecimal should probably be renamed to something like BigFloat" may
not be a good idea – first, parsing binary floats with arbitrary precision
is not something easy and commonly supported, and secondly, decimal
precision guaranties suits better most large precision applications.
* The encoding of decimal numbers should be very carefully specified.

Btw, I hope that in few days I will publish the exact results of the
analysis I mentioned, which is actually a by-product of an effort to define
a strict data model for JSON.

Best,
Milo Sredkov


On Thu, Sep 22, 2011 at 5:33 PM, Don Owens <don@...> wrote:

> **
>
>
> I forgot to add that encoders should only use the big number format if the
> number is too big to fit in int64 (or int32, depending on which will be the
> largest in the spec) or a double. That way, if a decoder can't handle a
> number larger than int64 anyway, it does not need to implement decoding of
> big numbers -- you don't want a number that will fit in an int32 put into a
> big number format anyway.
>
>
> On Thu, Sep 22, 2011 at 7:15 AM, Don Owens <don@...> wrote:
>
> > Yes, that is what I was getting at. But see comments embedded.
> >
> > On Wed, Sep 21, 2011 at 7:50 PM, rkalla123 <rkalla@...> wrote:
> >
> >> **
> >>
> >>
> >> Don,
> >>
> >> I see your point. The way I understand it is that this would require 2
> new
> >> data types, effectively BigInt and BigDecimal.
> >>
> >> So say something along these lines:
> >>
> >> bigint - marker 'G'
> >> [G][129][129 big-endian ordered bytes representing a BigInt]
> >>
> >> It should be mentioned that they are signed ints, but doing two's
> > complement and such is probably too much work. Maybe just specify that
> the
> > first bit always represents the sign (0 for no sign, 1 or minus).
> >
> >
> >> bigdouble - marker 'W'
> >> [W][222][222 big-endian ordered bytes representing a BigDecimal]
> >>
> >>
> > BigDecimal should probably be renamed to something like BigFloat, since
> > decimal is ambiguous (used to mean base-10 and floating point). I'm less
> > familiar with large floating point, but I think a floating point number
> > should consist of a sign bit plus two integers (one for the
> > mantissa/significand and one for the exponent). In the interest of space
> > savings, I think the sign bit should just be included in the exponent and
> > order things so they look similar to the IEEE 754 spec, e.g.,
> >
> > [W][3][3 big-endian ordered bytes (where first bit is sign bit) of
> > exponent][222][222 big-endian ordered bytes of mantissa]
> >
> >
> >
> >> Thoughts?
> >>
> >
> > In terms of the documentation, I think the big integers and floats should
> > be qualified with a "should implement" instead of a "must implement",
> since,
> > as others have mentioned, not every encoder and decoder will be able to
> > handle these. I think this matches JSON implementations well. If an
> > encoder does not handle large numbers, it could just throw an error, just
> as
> > it should throw an error now if an oversized number is encountered in
> JSON.
> > The same goes for the decoder side. If there is no good way to represent
> a
> > large number in the language your are working in, throw an error
> indicating
> > that the number is too large.
> >
> > Have you looked into using variable-length integers for length
> specifiers?
> > If you have a lot of short strings (or big numbers, etc.) in your data,
> > these could significantly reduce your space usage (at the cost of more
> > complexity for the developer and CPU). There should be a balance between
> > space efficiency and complexity. Thoughts?
> >
> >
> >
> >>
> >>
> >
> >> --- In json@yahoogroups.com, Don Owens <don@...> wrote:
> >> >
> >> > I've seen very large numbers used in JSON. In Perl, that can be
> >> represented
> >> > as a Math::BigInt object. And that is the way I have implemented it in
> >> my
> >> > JSON module for Perl (JSON::DWIW). Python has arbitrary length
> integers
> >> > built-in. For my own language that I'm working on, I'm using libgmp in
> C
> >> to
> >> > handle arbitrary length integers.
> >> >
> >> > JSON is used as a data exchange format. I want to be able to do a
> >> > roundtrip, e.g., Python -> encoded -> Python with native integers
> (with
> >> > arbitrary length in this case). In JSON, this just works, as far as
> the
> >> > encoding is concerned. I see the need for this in any binary JSON
> format
> >> as
> >> > well. If a large number is represented as a string, then on the
> decoding
> >> > side, you don't know if that was a number or a string (just because it
> >> looks
> >> > like a number doesn't mean that the sender means it's a number). If,
> >> when
> >> > decoding JSON, the library can't handle large numbers, it has to throw
> >> an
> >> > error anyway. The same should go for binary JSON.
> >> >
> >> > ./don
> >>
> >>
> >>
> >
> >
> >
> > --
> > Don Owens
> > don@...
> >
> >
>
> --
> Don Owens
> don@...
>
> [Non-text portions of this message have been removed]
>
>
>


[Non-text portions of this message have been removed]

#1719 From: Stephan Beal <sgbeal@...>
Date: Thu Sep 22, 2011 10:00 pm
Subject: Re: Re: Universal Binary JSON Specification
stephan.beal
Send Email Send Email
 
On Thu, Sep 22, 2011 at 11:47 PM, Milo Sredkov <miloslav@...> wrote:
supported by the specification, and tools try their best to deliver it
>
> without any loss of precision. This approach makes most sense to me – it
> allows JSON to be used for a large number of applications. However, it
>
i would argue that a large number of _types_ of applications become
possible, but that a smaller _number_ of applications would be possible
because the complexities involved would probably not be
tolerable/implemented by the vast majority of the JSON libs. (Some would
argue that's a good thing - weeding out the market.)

While i cannot argue against anything you say about numerics - it's all
valid, as far as i'm concerned - the most beautiful thing about JSON is it's
brain-deaded simplicity. While it is, technically speaking, unfortunate that
we don't have a solid rule about how long a number may be, it is also
refereshing not to have to think too much about that type of detail in my
client code. For the vast majority of the libs/apps i (and, i suspect, most
people) write, numbers >2B are simply never used, which means i can live in
peace with a signed 32-bit limitation for those cases. IMO, anyone trying to
use JSON numbers for 17+-decimal-place precision is using the wrong tool for
the job.

--
----- stephan beal
http://wanderinghorse.net/home/stephan/


[Non-text portions of this message have been removed]

#1720 From: Tatu Saloranta <tsaloranta@...>
Date: Fri Sep 23, 2011 12:28 am
Subject: Re: Re: Universal Binary JSON Specification
cowtowncoder
Send Email Send Email
 
On Thu, Sep 22, 2011 at 12:12 AM, Stephan Beal <sgbeal@...> wrote:
> On Thu, Sep 22, 2011 at 8:20 AM, Tatu Saloranta <tsaloranta@...>wrote:
>
>> **
>> Yes, to properly support full JSON data set, one should provide
>> BigInteger/-Decimal...
>>
>
> i don't agree: JSON does not specify an integer size, which means that
> supporting only an 8-bit int is still valid JSON.

You have a very interesting way of reading specifications -- when spec
does not limit magnitude or precision, you claim it's fine to use
whatever size: by that logic, it'd be fine to only support values 0
and 1. Or just 0.

Put another way: if you only support a subset, then format can not
represent all valid JSON documents, and thus is just a subset, not a
1-to-1 equivalent.

As to not all environments having BigDecimal/BigInteger, that's a red
herring -- as long as you define exact format of data, any environment
can support it, even if by just exposing array of bytes.

-+ Tatu +-

#1721 From: Tatu Saloranta <tsaloranta@...>
Date: Fri Sep 23, 2011 12:31 am
Subject: Re: Re: Universal Binary JSON Specification
cowtowncoder
Send Email Send Email
 
On Thu, Sep 22, 2011 at 7:21 AM, Don Owens <don@...> wrote:
> I didn't mean to imply that every environment has this support -- I'm very
> aware that most environments do not.  However, the same issue arises when
> using JSON.  If you encounter a number that is too large to fit in your
> available integer/float sizes, you should return an error (and you should
> definitely check this).  It should be the same in the case of binary JSON --
> if the environment can't handle numbers of that size, return an error.
>
> My concern is that I shouldn't have to switch formats or hack a format in
> order to do round trips with data that should fit in the "golden data
> structure".

I agree with this, and also with the general idea of using most
economic applicable type (i.e. not using unlimited-length
representation for small integers, for example), although this is more
a heuristic.
But fundamentally I think that binary representations should be able
to represent arbitrary JSON content, losslesly. Especially since this
is not all that difficult to do at format level.

-+ Tatu +-

#1722 From: Stephan Beal <sgbeal@...>
Date: Fri Sep 23, 2011 12:47 am
Subject: Re: Re: Universal Binary JSON Specification
stephan.beal
Send Email Send Email
 
On Fri, Sep 23, 2011 at 2:28 AM, Tatu Saloranta <tsaloranta@...>wrote:

> **
> You have a very interesting way of reading specifications -- when spec
> does not limit magnitude or precision, you claim it's fine to use
> whatever size: by that logic, it'd be fine to only support values 0
> and 1. Or just 0.
>

Absolutely. That's a literal interpretation (but not a sane one, i admit!).


> Put another way: if you only support a subset, then format can not
> represent all valid JSON documents, and thus is just a subset, not a
> 1-to-1 equivalent.
>

Interpreted that way, all implementations must implement arbitrary-precision
numbers. (And that interpretation's also valid for a grammar which doesn't
specify a max length.)

So we're screwed either way ;).

As to not all environments having BigDecimal/BigInteger, that's a red
> herring -- as long as you define exact format of data, any environment
> can support it, even if by just exposing array of bytes.
>

"Support it", sure, but not using a 1-to-1 type mapping as is generally
possible with other types.

--
----- stephan beal
http://wanderinghorse.net/home/stephan/


[Non-text portions of this message have been removed]

#1723 From: Patrick Maupin <pmaupin@...>
Date: Fri Sep 23, 2011 3:55 pm
Subject: Re: Re: Universal Binary JSON Specification
patmaupin
Send Email Send Email
 
I'm all for big integer support.  I use it all the time (from Python).

As an aside, as others have pointed out, there are other similar
efforts around.  If you really want to distinguish this one by making
it truly universal, then you really do have to support big integers.
JSON != JavaScript

In terms of design features, have you looked at the Python
pickle/cPickle modules?  Even though the problem you are solving is
not exactly the same as the problem those solve, the problems are
quite similar, and it may be instructive to examine a data format that
solves a similar problem, and the well-tested underlying code (both
pure Python and C available) that implements readers and writers for
the format.

Thanks and best regards,
Patrick Maupin

On Thu, Sep 22, 2011 at 8:43 AM, rkalla123 <rkalla@...> wrote:
>
>
>
> Stephan,
>
> It reminds me of our conversation earlier about 64-bit. As you mentioned, Don
has a great point, but the uniqueness of the data structure (I doubt the
majority of people using JSON would use it) combined with my gut telling me to
wait, I think I am going to specify BigInt/BigDecimal support in the official
specification under a new section "Pending" and wait for feedback from more
people.
>
> I am working hard to keep the spec so conceptually simple that it just fits in
a tiny brain-pocket any time someone reads it and they feel empowered
immediately to start getting work done.
>
> This might mean at the beginning some fringe items not being addressed, but
I'd rather add them under strong demand later, then add them now and make those
extra 10% of features suddenly make the format seem JUST complex enough that
someone skimming it, hoping for something simple to use starts to glaze their
eyes over and not be interested anymore.
>
> I really appreciate this dialog on the subject guys, it helps get all the
aspects out on the table early!

#1724 From: "rkalla123" <rkalla@...>
Date: Fri Sep 23, 2011 4:16 pm
Subject: Re: Universal Binary JSON Specification
rkalla123
Send Email Send Email
 
Patrick,
Thank you for the pointer.

--- In json@yahoogroups.com, Patrick Maupin <pmaupin@...> wrote:
>
> I'm all for big integer support.  I use it all the time (from Python).
>
> As an aside, as others have pointed out, there are other similar
> efforts around.  If you really want to distinguish this one by making
> it truly universal, then you really do have to support big integers.
> JSON != JavaScript
>
> In terms of design features, have you looked at the Python
> pickle/cPickle modules?  Even though the problem you are solving is
> not exactly the same as the problem those solve, the problems are
> quite similar, and it may be instructive to examine a data format that
> solves a similar problem, and the well-tested underlying code (both
> pure Python and C available) that implements readers and writers for
> the format.
>
> Thanks and best regards,
> Patrick Maupin
>
> On Thu, Sep 22, 2011 at 8:43 AM, rkalla123 <rkalla@...> wrote:
> >
> >
> >
> > Stephan,
> >
> > It reminds me of our conversation earlier about 64-bit. As you mentioned,
Don has a great point, but the uniqueness of the data structure (I doubt the
majority of people using JSON would use it) combined with my gut telling me to
wait, I think I am going to specify BigInt/BigDecimal support in the official
specification under a new section "Pending" and wait for feedback from more
people.
> >
> > I am working hard to keep the spec so conceptually simple that it just fits
in a tiny brain-pocket any time someone reads it and they feel empowered
immediately to start getting work done.
> >
> > This might mean at the beginning some fringe items not being addressed, but
I'd rather add them under strong demand later, then add them now and make those
extra 10% of features suddenly make the format seem JUST complex enough that
someone skimming it, hoping for something simple to use starts to glaze their
eyes over and not be interested anymore.
> >
> > I really appreciate this dialog on the subject guys, it helps get all the
aspects out on the table early!
>

#1725 From: John Cowan <cowan@...>
Date: Fri Sep 23, 2011 4:36 pm
Subject: Re: Re: Universal Binary JSON Specification
johnwcowan
Send Email Send Email
 
Patrick Maupin scripsit:

> In terms of design features, have you looked at the Python
> pickle/cPickle modules?  Even though the problem you are solving is
> not exactly the same as the problem those solve, the problems are
> quite similar, and it may be instructive to examine a data format that
> solves a similar problem, and the well-tested underlying code (both
> pure Python and C available) that implements readers and writers for
> the format.

Unfortunately, the various pickle formats are apparently not documented
anywhere that Google can find.  Can you provide a pointer?

(There is also a JavaScript implementation, but it handles only pickle
format 0, the textual representation, as you'd expect.)

--
John Cowan  cowan@...  http://ccil.org/~cowan
Any sufficiently-complicated C or Fortran program contains an ad-hoc,
informally-specified bug-ridden slow implementation of half of Common Lisp.
         --Greenspun's Tenth Rule of Programming (rules 1-9 are unknown)

#1726 From: "rkalla123" <rkalla@...>
Date: Fri Sep 23, 2011 4:43 pm
Subject: Re: Universal Binary JSON Specification
rkalla123
Send Email Send Email
 
Don,

Great feedback so far, I have a few thoughts on the subject:

1. The hard-to-measure value of a specification being simple and immediately
grok'able is more important than total coverage. I think we've all seen that any
number of times... for example XML vs JSON. XML defines support for every
possible data structure known and unknown through schema references. The *need*
to become so incredibly verbose sent people screaming into the arms of a simpler
format (JSON) at the first sign of alternatives.

2. As soon as a specification, of any kind, delves into concepts that are not
immediately map-able to a mental model that you are familiar with, I would say
assimilation of the concepts slows down about 4x.

3. Theoretically I agree with you 110% that the format needs to natively support
arbitrarily large numeric formats to be successful in all sorts of use cases.
There is absolutely no argument here, every reason you have given is spot on.

4. BUT, I have a very strong feeling (I don't know why... divine intervention
maybe) that the addition of these two arbitrary types that are unfamiliar to
most people writing software today, could be *just* strange enough to seriously
slow down assimilation of a new data format.

e.g. "int, ok got it... double, yep use that all the time, String, yea makes
sense, BigInt... wait... what is an arbitrarily long number? I don't get it,
does <MY_LANG> support that? I've never used one... how do you convert a byte[]
into a *number*... weird... I gotta go read some docs now"

That is exaggerated for sure, but you see what I am getting at. Because we are
operating at the spec level, the nature of the work is to nit-pick and poke and
prod and make sure every 'i is dotted and 't' is crossed. That is good, but
there is a point at which the intersection between what a spec provides and what
people want maximizes and then starts to wane at the cost of the success of the
*entire* spec unfortunately.

I have a very strong feeling that the complexities of arbitrarily sized numbers
is exactly that apex at which returns start to fall off for the greater good.

5. Given #3 and #4, I want to define the BigInt and BigDecimal support as
proposals and add them to the specification on the site and let people discuss
them further until there is a strong preference for or against them.

I don't want you to think that I disagree with you, I don't... it is just this
very strong nagging gut feeling I have that I have to honor in the name of
simplicity.

6. I would make the argument, that if you took the grouping of people ALL using
JSON as a data interchange, say 100,000 people, the number of people in that
group using BigInt and BigDecimals to exchange data between two internal systems
that both support those numeric formats is... a small percentage. (this leads
into point #7)

7. Simplicity is what will make this specification succeed over other,
potentially faster specs. JSON never won the format war because it was fast or
more efficient... it won because it was so unbelievably easy to use.

I could sit down with a C developer and a Erlang developer and say "OK guys, my
web service is going to generate replies like THIS, you two need to process that
and send me back results that look like that too"

There is no discussion of namespaces, schemas, DTDs, encoding, dublin core or
endianness... it was like describing a CSV file format to someone with braces.

I am trying to model that in binary in more than just data representation, but
also spirit. That is why some of the binary representations are possibly 1 or 2
bytes longer than they could be if maximally optimized or why simple human
readable char markers were chosen for easy discovery in a HEX editor.

It is my belief that the utter simplicity of describing a single layout
(marker-size-data) that maps to known types in almost every modern language is
what will make this work well.

This may limit the Universal Binary JSON Spec from being the ultimate binary
data format, but there are other more specific and difficult-to-use specs that
offer faster performance if that level of detail is what you need (e.g. protobuf
comes to mind).

-----------
My goal is to create the every-man's binary format just like JSON became the
every-man's data interchange format.

It isn't for everybody, but it works wonderfully for a whole lot of people.
-----------

Thank you again for the well thought out feedback Don.

--- In json@yahoogroups.com, Don Owens <don@...> wrote:
>
> I forgot to add that encoders should only use the big number format if the
> number is too big to fit in int64 (or int32, depending on which will be the
> largest in the spec) or a double.  That way, if a decoder can't handle a
> number larger than int64 anyway, it does not need to implement decoding of
> big numbers -- you don't want a number that will fit in an int32 put into a
> big number format anyway.
>
>
> On Thu, Sep 22, 2011 at 7:15 AM, Don Owens <don@...> wrote:
>
> > Yes, that is what I was getting at.  But see comments embedded.
> >
> > On Wed, Sep 21, 2011 at 7:50 PM, rkalla123 <rkalla@...> wrote:
> >
> >> **
> >>
> >>
> >> Don,
> >>
> >> I see your point. The way I understand it is that this would require 2 new
> >> data types, effectively BigInt and BigDecimal.
> >>
> >> So say something along these lines:
> >>
> >> bigint - marker 'G'
> >> [G][129][129 big-endian ordered bytes representing a BigInt]
> >>
> >> It should be mentioned that they are signed ints, but doing two's
> > complement and such is probably too much work.  Maybe just specify that the
> > first bit always represents the sign (0 for no sign, 1 or minus).
> >
> >
> >> bigdouble - marker 'W'
> >> [W][222][222 big-endian ordered bytes representing a BigDecimal]
> >>
> >>
> > BigDecimal should probably be renamed to something like BigFloat, since
> > decimal is ambiguous (used to mean base-10 and floating point).  I'm less
> > familiar with large floating point, but I think a floating point number
> > should consist of a sign bit plus two integers (one for the
> > mantissa/significand and one for the exponent).  In the interest of space
> > savings, I think the sign bit should just be included in the exponent and
> > order things so they look similar to the IEEE 754 spec, e.g.,
> >
> > [W][3][3 big-endian ordered bytes (where first bit is sign bit) of
> > exponent][222][222 big-endian ordered bytes of mantissa]
> >
> >
> >
> >>  Thoughts?
> >>
> >
> > In terms of the documentation, I think the big integers and floats should
> > be qualified with a "should implement" instead of a "must implement", since,
> > as others have mentioned, not every encoder and decoder will be able to
> > handle these.  I think this matches JSON implementations well.  If an
> > encoder does not handle large numbers, it could just throw an error, just as
> > it should throw an error now if an oversized number is encountered in JSON.
> >  The same goes for the decoder side.  If there is no good way to represent a
> > large number in the language your are working in, throw an error indicating
> > that the number is too large.
> >
> > Have you looked into using variable-length integers for length specifiers?
> >  If you have a lot of short strings (or big numbers, etc.) in your data,
> > these could significantly reduce your space usage (at the cost of more
> > complexity for the developer and CPU).  There should be a balance between
> > space efficiency and complexity.  Thoughts?
> >
> >
> >
> >>
> >>
> >
> >> --- In json@yahoogroups.com, Don Owens <don@> wrote:
> >> >
> >> > I've seen very large numbers used in JSON. In Perl, that can be
> >> represented
> >> > as a Math::BigInt object. And that is the way I have implemented it in
> >> my
> >> > JSON module for Perl (JSON::DWIW). Python has arbitrary length integers
> >> > built-in. For my own language that I'm working on, I'm using libgmp in C
> >> to
> >> > handle arbitrary length integers.
> >> >
> >> > JSON is used as a data exchange format. I want to be able to do a
> >> > roundtrip, e.g., Python -> encoded -> Python with native integers (with
> >> > arbitrary length in this case). In JSON, this just works, as far as the
> >> > encoding is concerned. I see the need for this in any binary JSON format
> >> as
> >> > well. If a large number is represented as a string, then on the decoding
> >> > side, you don't know if that was a number or a string (just because it
> >> looks
> >> > like a number doesn't mean that the sender means it's a number). If,
> >> when
> >> > decoding JSON, the library can't handle large numbers, it has to throw
> >> an
> >> > error anyway. The same should go for binary JSON.
> >> >
> >> > ./don
> >>
> >>
> >>
> >
> >
> >
> > --
> > Don Owens
> > don@...
> >
> >
>
>
> --
> Don Owens
> don@...
>
>
> [Non-text portions of this message have been removed]
>

#1727 From: "rkalla123" <rkalla@...>
Date: Fri Sep 23, 2011 4:47 pm
Subject: Re: Universal Binary JSON Specification
rkalla123
Send Email Send Email
 
Milo,

Agreed on how the format must be presented on the website, I am keeping your
post bookmarked as a TODO list as I work on formalizing the spec on the website.

The details on the number formats gave me a little miniature stroke, I need some
more time to digest that ;)

-R

--- In json@yahoogroups.com, Milo Sredkov <miloslav@...> wrote:
>
> Hello Riyad, Stephan, Don, Tatu, and all group members,
>
> I recently analysed about 70 of the libraries linked from json.org (almost
> all listed in the C++, C, Java, Python, Haskell, JavaScript, Ruby, C#, PHP,
> and Lisp sections) and would like to share some opinions about the presented
> specification, and also about some of the topics that rose in the discussion
> so far.
>
> First, I'm really happy to see people trying to do cool things in favour of
> the JSON community â€" a simple and efficient binary JSON representation is
> for sure a cool thing from which we can all benefit. Although you will
> probably need to do more work in order to  show everyone that this format is
> *the one*, initiating a discussion is probably the right thing to do.
>
> IMHO there are two things, already pointed out by the others, which I find
> disturbing. First, as other binary JSON representations are already present,
> you need to position the Universal Binary JSON format very clearly compared
> to the alternatives, especially to the aforementioned Smile, which seems to
> solve very similar goals. It's obvious that the proposed format is simple,
> and that this may be its unique strength, but unless you clearly
> (quantitatively) show exactly how simpler it is, people will not hurry to
> adopt it. What's more, as there already exist several binary JSON formats,
> failure to persuade the community that the new one is superior not only will
> make your effort unsuccessful, it will also worsen things by introducing
> additional discrepancy.
>
> The second issue is about the numbers. This one is actually an issue of JSON
> itself, and more specifically, the fact that JSON is specified only at the
> syntax level and there lacks a commonly accepted data-model (or meta-model,
> information model, etc.) specifying the set of information that can be
> encoded in JSON. The JSON specification just states how numbers are encoded.
> It does not state whether 10, 10.0, or 1E1 are different numbers, neither
> does it say how large the numbers can be, or whether the concrete way in
> which they are written is important. From this, in the libraries for the 10
> programming languages I mentioned, there are huge variations in the
> supported range formats. Some distinguish integers from floats, others
> don't, some expose the concrete string in which the number was encoded,
> others don't, and so on. Most importantly, the supported ranges vary for
> each library, starting from 30bit (not 32) signed integers to unlimited
> decimal numbers.
>
> Having said that, although you are not the one who is responsible for the
> situation, you should really treat the numbers very carefully. In my
> opinion, thinking language neutrally, there are only 2 strategies (or data
> models) that make sense. The first, which many people imply because of
> JSON's origin, is to assume JavaScript semantics, that is, in this case that
> numbers are 64bit IEEE 754 floating point numbers. This makes things really
> simple, but is not suitable for applications where rounding errors are not
> tolerable, e.g. storing monetary values. The second approach is to assume
> (unlimited) decimal numbers â€" tools are free to have their limitations, but
> any real number that can be encoded as a finite decimal fraction is
> supported by the specification, and tools try their best to deliver it
> without any loss of precision. This approach makes most sense to me â€" it
> allows JSON to be used for a large number of applications. However, it
> contrasts to the JSON's idea of being the intersection of the modern
> programming languages, not the union. Adopting it means the following:
> * There is no reason for having big integers at the format level, only
> decimal numbers should be enough (1.0 == 1 == 1.00e0)
> * The semantics and precision guarantees of the "double" encoding should be
> very carefully and strictly defined. Keep in mind, that even simple decimal
> values like 1.7 cannot be expressed exactly in binary IEEE 754 floats
> * +0, -0, 0.0e0, are the same value, and according to the rule of picking
> the smallest suitable type, they should be encoded as "byte"
>  * "BigDecimal should probably be renamed to something like BigFloat" may
> not be a good idea â€" first, parsing binary floats with arbitrary precision
> is not something easy and commonly supported, and secondly, decimal
> precision guaranties suits better most large precision applications.
> * The encoding of decimal numbers should be very carefully specified.
>
> Btw, I hope that in few days I will publish the exact results of the
> analysis I mentioned, which is actually a by-product of an effort to define
> a strict data model for JSON.
>
> Best,
> Milo Sredkov

#1728 From: "rkalla123" <rkalla@...>
Date: Fri Sep 23, 2011 4:49 pm
Subject: Re: Universal Binary JSON Specification
rkalla123
Send Email Send Email
 
Stephan, your comment sums up my feelings nicely:

--- In json@yahoogroups.com, Stephan Beal <sgbeal@...> wrote:
> While i cannot argue against anything you say about numerics - it's all
> valid, as far as i'm concerned - the most beautiful thing about JSON is it's
> brain-deaded simplicity. While it is, technically speaking, unfortunate that
> we don't have a solid rule about how long a number may be, it is also
> refereshing not to have to think too much about that type of detail in my
> client code.

#1729 From: Tatu Saloranta <tsaloranta@...>
Date: Sat Sep 24, 2011 3:03 am
Subject: Re: Re: Universal Binary JSON Specification
cowtowncoder
Send Email Send Email
 
On Thu, Sep 22, 2011 at 3:00 PM, Stephan Beal <sgbeal@...> wrote:
> On Thu, Sep 22, 2011 at 11:47 PM, Milo Sredkov <miloslav@...> wrote:
> supported by the specification, and tools try their best to deliver it
>>
>> without any loss of precision. This approach makes most sense to me – it
>> allows JSON to be used for a large number of applications. However, it
>>
> i would argue that a large number of _types_ of applications become
> possible, but that a smaller _number_ of applications would be possible
> because the complexities involved would probably not be
> tolerable/implemented by the vast majority of the JSON libs. (Some would
> argue that's a good thing - weeding out the market.)

I think it is patronizing to suggest that something as simple as
supporting Big Integer and -Decimal would be beyond skills of
competent parser writers -- world is full of XML, YAML and BSON
parsers, even though as formats they are vastly more complex than
anything one could do to support basic 100% JSON information model.

And optimizing for simplicity of format seems misguided here, as it is
irrelevant for end users. Why? Since being binary format, no user will
ever write or hand edit such content and all such encoded content
comes from:

(a) Generators that produce format, or
(b) Converters that take JSON, produce binary alternative

(and conversely for parsers)

This means that simplicity for end users is mostly defined by
simplicity of API to process content -- if it is 100% JSON compatible,
it's same as any old JSON API, which in turn is in optimal case based
on simplicity of logical content model which should be same for JSON
and matching binary serialization.

So: for end users, simplicity of underlying format is all but
irrelevant; and as to implementors of supporting libraries (of which
only small amount is needed anyway, couple per language), their
interest is usually based more on value of supporting format (how
widely format is or is expected to be used) and possible benefits of
format (compactness, processing efficiency).

-+ Tatu +-

#1730 From: Patrick Maupin <pmaupin@...>
Date: Sat Sep 24, 2011 4:25 am
Subject: Re: Re: Universal Binary JSON Specification
patmaupin
Send Email Send Email
 
On Fri, Sep 23, 2011 at 11:36 AM, John Cowan <cowan@...> wrote:

> **
>
>  Unfortunately, the various pickle formats are apparently not documented
> anywhere that Google can find. Can you provide a pointer?
>
>
The best documentation I know of is in the source for the pickle and
pickletools modules.  You can find them, e.g. like this:

$ python
Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle, pickletools
>>> pickle.__file__[:-1]
'/usr/lib/python2.6/pickle.py'
>>> pickletools.__file__[:-1]
'/usr/lib/python2.6/pickletools.py'
>>>

Regards,
Pat


[Non-text portions of this message have been removed]

#1731 From: Stephan Beal <sgbeal@...>
Date: Sat Sep 24, 2011 1:15 pm
Subject: Re: Re: Universal Binary JSON Specification
stephan.beal
Send Email Send Email
 
On Sat, Sep 24, 2011 at 5:03 AM, Tatu Saloranta <tsaloranta@...>wrote:

> I think it is patronizing to suggest that something as simple as
> supporting Big Integer and -Decimal would be beyond skills of
> competent parser writers -- world is full of XML, YAML and BSON
>

Not beyond the skills - beyond the patience and needs. i host 2 (C and C++)
JSON libraries and i have absolutely no need for big numbers, so i would
never bother to add them. Because of that, people who DO want big numbers
won't even take a second look at my libs. i.e., mine then die out through
darwinian processes.

--
----- stephan beal
http://wanderinghorse.net/home/stephan/


[Non-text portions of this message have been removed]

#1732 From: Dennis Gearon <gearond@...>
Date: Sat Sep 24, 2011 5:25 pm
Subject: Re: Re: Universal Binary JSON Specification
gearond...
Send Email Send Email
 
this subject in this mail list generated more email in one week that the list
had in 6 months.

I therefore stepped back, too much to follow.

Anyone willing to send me a summary of what has gone on?

  Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.




________________________________
From: Patrick Maupin <pmaupin@...>
To: json@yahoogroups.com
Sent: Fri, September 23, 2011 9:25:21 PM
Subject: Re: [json] Re: Universal Binary JSON Specification


On Fri, Sep 23, 2011 at 11:36 AM, John Cowan <cowan@...> wrote:

> **
>
>  Unfortunately, the various pickle formats are apparently not documented
> anywhere that Google can find. Can you provide a pointer?
>
>
The best documentation I know of is in the source for the pickle and
pickletools modules.  You can find them, e.g. like this:

$ python
Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle, pickletools
>>> pickle.__file__[:-1]
'/usr/lib/python2.6/pickle.py'
>>> pickletools.__file__[:-1]
'/usr/lib/python2.6/pickletools.py'
>>>

Regards,
Pat

[Non-text portions of this message have been removed]




[Non-text portions of this message have been removed]

#1733 From: "rkalla123" <rkalla@...>
Date: Sat Sep 24, 2011 5:52 pm
Subject: Re: Universal Binary JSON Specification
rkalla123
Send Email Send Email
 
Hey Dennis,

Here is where we are:

I have been working on a Universal Binary JSON specification and asked the group
for feedback on it. You can check the spec here, it isn't too long:
https://docs.google.com/document/d/12SimAfBVcl8Fd-lr_SSBkM5B_PyEhDRfhgu1Lzvfpfw/\
edit?hl=en_US

I am in the midst of formalizing it to the main site here:
http://ubjson.org/

The reason for (another, hopefully final) binary JSON format is:
1. To strictly adhere to the original JSON spec, introducing no
incompatibilities or binary-only data structures that could lead to
incompatibilities (BSON, BJSON).
2. To strictly follow the core "Ease of use" tenant of JSON. This means more
verbosity than something like Smile, but dead-simple, singular data structure to
parse and understand.

The struct looks like this:
[type, 1-byte char]([length/count, 4-byte int])([binary data])

My goal is that you can understand the spec in under 10mins. I see this as a
fundamental requirement of the spec.

In the specification, I took the JavaScript "Number" data type, and broke it
down into 4 sub-types that I found mapped the most immediately to discrete data
types in most of the popular programming languages I checked specs on.

Namely:
* byte, 1-byte
* int32, 4-byte
* int64, 8-byte
* double, 8-byte

In most of the languages I checked, these had native representations and
optimized performance characteristics because of the support in the platform
(e.g. C#, Java, etc.)

I think for the most part everyone has been OK up to this point, the discussion
that spawned more discussion was the support for arbitrarily long Integers and
Decimals numbers.

Specifying the format is easy enough (marker followed by length of bytes
followed by the bytes, big-endian), but a few folks (myself included) feel that
the requirement of arbitrarily long numbers for most developers is likely in the
minority at the cost of adding 2 new data constructs to the specification that
*don't* have a functional representation in every language (although it does in
most).

My feeling is that adding this complexity, however small, might make the spec
just that much harder to understand in under 10mins or that much harder to write
a parser/generator for that it could stymie adoption and usability.

Keeping in mind that JSON's support for arbitrarily long numbers is undefined
and depends on the parser and language you are using already, so I saw no reason
to try and diverge by addressing this specifically in the binary spec if it had
no ancillary in the JSON spec. People using the binary spec can work around this
by string-encoding their long numbers, but I appreciate that is a sub-optimal
approach for a lot of folks.

And that is the recap of where we are. Look forward to your feedback!

Best,
Riyad

--- In json@yahoogroups.com, Dennis Gearon <gearond@...> wrote:
>
> this subject in this mail list generated more email in one week that the list
> had in 6 months.
>
> I therefore stepped back, too much to follow.
>
> Anyone willing to send me a summary of what has gone on?
>
>  Dennis Gearon

#1734 From: John Cowan <cowan@...>
Date: Sat Sep 24, 2011 6:01 pm
Subject: Re: Re: Universal Binary JSON Specification
johnwcowan
Send Email Send Email
 
rkalla123 scripsit:

> Keeping in mind that JSON's support for arbitrarily long numbers
> is undefined and depends on the parser and language you are using
> already, so I saw no reason to try and diverge by addressing this
> specifically in the binary spec if it had no ancillary in the
> JSON spec. People using the binary spec can work around this by
> string-encoding their long numbers, but I appreciate that is a
> sub-optimal approach for a lot of folks.

Well then, why not add just one more type code for "string-encoded
number"?  That way, ordinary JSON can be round-tripped with 100%
reliability.  Well, if you have a string, array, or object with more
than 4G elements, it still can't be, but I think that objection is
*really* technical.  The other formats allow it, but at the expense of
more processing complexity.

In fact, your real selling points are simple processing (at the expense
of some compression) and 100% JSON compatibility in actual use.

--
Income tax, if I may be pardoned for saying so,         John Cowan
is a tax on income.  --Lord Macnaghten (1901)           cowan@...

Messages 1705 - 1734 of 1958   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help