Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

aalto-xml-interest · Aalto XML Parser (stax)

The Yahoo! Groups Product Blog

Check it out!

Group Information

  • Members: 17
  • Category: XML
  • Founded: Feb 2, 2008
  • Language: English
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Messages

Advanced
Messages Help
Messages 20 - 49 of 75   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#20 From: "yasodats" <yasodats@...>
Date: Sat Oct 4, 2008 5:11 pm
Subject: Get air tickets almost for nothing
yasodats
Send Email Send Email
 
As the business war on the air is hotting up with so many low-budget
airlines dominating the sky, the passengers can now enjoy the best
deals. The webguide - http://air.myguideforlife.com - helps you find
the cheapest and even free air tickets to make your journey ore
happier and very less expensive.

Visit the website now to find out how to get the cheap and free air
tickets: http://air.myguideforlife.com

#21 From: Tatu Saloranta <tsaloranta@...>
Date: Thu Jan 22, 2009 7:08 am
Subject: Version 0.9.3 released; licensing clarified
cowtowncoder
Send Email Send Email
 
It has been a while, but after Woodstox 4.0.0 was released, and Typed
Access API (part of stax2-api) finalized, it was time to work a bit on
Aalto. With 0.9.3, Aalto Typed Access implementation is complete,
fully functional, and fast too.
There are also a few bug fixes, see release-notes/VERSION for details.
No new work was done on async side, the main goal at this point is to
complete Stax and Stax2 API implementations.

On a related note, licensing for Aalto is clarified so that there are
now 2 licensing models:

- GPL (plain), for Free usage.
- Commercial for-fee licensing for those who want full control on how
they use Aalto, including redistribution.

If anyone is interested in latter, let me know; GPL usage is as simple
as downloading the jars (and being aware of how GPL affects
distribution of the app that uses GPL'd code etc).

Please let me know how the new version works,

-+ Tatu +-

#22 From: Tatu Saloranta <tsaloranta@...>
Date: Tue Jan 27, 2009 7:28 am
Subject: Interesting Aalto reference, linux+jibx+aalto apparently kick butt: [http://technotes.blogs.sapo.pt/]
cowtowncoder
Send Email Send Email
 
Although the article is bit light on specifics (or I missed them),
this does look very interesting:

http://technotes.blogs.sapo.pt/

looks like Linux + open source java combination can run circles around
equivalent Microsoft offering.
That is not altogether surprising, based on little I have heard from
.net stack (etc), and especially due to maturity of Java open source
packages used here. But it is good to get confirmation on suspicions.

Also: I am trying to finally tackle the only 2 major missing pieces
wrt Stax 1.0 compliancy that Aalto (blocking) has:

* Namespace-repairing mode for XMLStreamWriter
* Coleascing mode for XMLStreamReader.

I will probably start with the repairing mode, and then get coalescing
mode done. Getting these done might be enough to get us to 1.0
version.

Also: I would really appreciate feedback from anyone who has taken
Aalto to test ride -- while Google will eventually bring some results
to my attention, it would be sweet to hear things directly from users.
:-)

-+ Tatu +-

#23 From: "lfs_neves" <lfs_neves@...>
Date: Tue Jan 27, 2009 12:48 pm
Subject: Re: Interesting Aalto reference, linux+jibx+aalto apparently kick butt: [http://technotes.blogs.sapo.pt/]
lfs_neves
Send Email Send Email
 
--- In aalto-xml-interest@yahoogroups.com, Tatu Saloranta
<tsaloranta@...> wrote:
>
> Although the article is bit light on specifics (or I missed them),
> this does look very interesting:
>
> http://technotes.blogs.sapo.pt/

You found my little benchmark! :-)
I will gladly awnser any question you have.

You can read the entire discussion with the guy from Microsoft
responsible for the benchmarks here:

http://forums.microsoft.com/MSDNWorkShop/ShowPost.aspx?PostID=4274183&SiteID=64

The discussion is long, but in the end he could reproduce my results.
The only way that WCF could beat aalto/jibx performance wise was using
a binary protocol.

>
> looks like Linux + open source java combination can run circles around
> equivalent Microsoft offering.

It's undoubtedly faster, but I would not call it equivalent.
The programming model is different and there a bunch of WS-* specs
that are supported in WCF that this test doesn't address.


>
> Also: I would really appreciate feedback from anyone who has taken
> Aalto to test ride -- while Google will eventually bring some results
> to my attention, it would be sweet to hear things directly from users.

Sorry for the silence.
I kept silent mainly because I didn't experienced any problems when
using Aalto, everything works, but faster :-)
You've done an excellent job.

--
Luis Neves

#24 From: Tatu Saloranta <tsaloranta@...>
Date: Tue Jan 27, 2009 6:22 pm
Subject: Re: Re: Interesting Aalto reference, linux+jibx+aalto apparently kick butt: [http://technotes.blogs.sapo.pt/]
cowtowncoder
Send Email Send Email
 
On Tue, Jan 27, 2009 at 4:48 AM, lfs_neves <lfs_neves@...> wrote:
> --- In aalto-xml-interest@yahoogroups.com, Tatu Saloranta
> <tsaloranta@...> wrote:
>>
>> Although the article is bit light on specifics (or I missed them),
>> this does look very interesting:
>>
>> http://technotes.blogs.sapo.pt/
>
> You found my little benchmark! :-)
> I will gladly awnser any question you have.
>
> You can read the entire discussion with the guy from Microsoft
> responsible for the benchmarks here:
>
>
http://forums.microsoft.com/MSDNWorkShop/ShowPost.aspx?PostID=4274183&SiteID=64

Thanks! Very interesting discussion, and thank you for getting back to
me so fast.
This feedback is very useful, and it is always good to get to know
people who test performance of WS stacks. I wish more developers did
that, there aren't enough independent tests around to give good idea
of real differences. And even more so with other things where xml
parsing/generation is a major component of performance.

-+ Tatu +-

#25 From: Tatu Saloranta <tsaloranta@...>
Date: Thu Jan 29, 2009 5:58 pm
Subject: Implemented namespace-repairing mode, then need coalescing mode, formalize API for non-blocking parser
cowtowncoder
Send Email Send Email
 
Ok, of 2 missing Stax 1.0 features, one is now fully implemented. The
next release (0.9.4) will contain fully functioning
namespace-repairing mode for XMLStreamWriter; it passes all staxtest
and stax2test test cases.

The next immediate task will be implementing the last major feature,
coalescing mode. After this, 1.0 could be finalized.

However, I think it would also make sense to add one more task for
1.0: formalize API to use for feeding non-blocking variant of
XMLStreamReader. Unlike blocking readers that can take in InputStream
or Reader (and references that can be used to create these),
non-blocking reader will not read any of its input. Rather, calling
app has to feed it new chunks of content once parser is done with the
current chunk.

Currently non-blocking parser prototype works as follows:

---
         InputStream in = new FileInputStream(file); // just to
generate the input, usually would be NIO-based
         final byte[] buf = new byte[3000];

         ReaderConfig cfg = new ReaderConfig();
         cfg.setActualEncoding("UTF-8"); // no encoding auto-detect yet
(will be added)

         // will need a factory, can't use XMLInputFactory as is
         AsyncUtfScanner asc = new AsyncUtfScanner(cfg);
         StreamReaderImpl sr = new StreamReaderImpl(asc);

         while (true) {
             int type;

             // We will feed chunked input 3 bytes at a time, for
test/demo purposes (even one byte would work)
             while ((type = sr.next()) == AsyncByteScanner.EVENT_INCOMPLETE) {
                 int len = in.read(buf, 1, 3);
                 if (len < 0) { // shouldn't happen in the middle of
partial token
                     System.err.println("Error: Unexpected EOF");
                     break main_loop;
                 }
                 asc.addInput(buf, 1, len);
             }
             if (type == END_DOCUMENT) { // to trigger this, caller
must signal actual end of input
                 break;
             }
             // otherwise, handle the token; all data is available
without blocking
      }

---

which clearly is not ready for production use, wires sticking out the
rat's nest kinda box. :-)

But the basic idea is simple: caller needs to handle EVENT_INCOMPLETE
return type, feed more data, indicate end of input when appropriate
(which may throw an exception etc), but otherwise work normally.
Once non-incomplete event is returned, all data associated will be
available without blocking.
Memory usage will be bounded by amount of memory needed for the single
event (and some state for nesting), and  specifically length of
individual text segments will be limited to chunk size that
application gives. That is, CHARACTERS/CDATA is returned as soon as at
least one character has been decoded (and up to contents of the whole
chunk passed).

Using such a non-blocking parser, it should be quite easy to build a
single-threaded (or, N-threaded for N cores/CPUs) xml input handling
server; and one that would perform nicely and could apply elaborate
throttling if need be.

One more thing that would be good to investigate is how easy it would
be to implement SAX API for non-blocking stream reader. That should
not be very hard -- blocking stream reader can already be used as a
SAX parser via JAXP (or directly).

Thoughts, comments, suggestions?

-+ Tatu +-

#26 From: Tatu Saloranta <tsaloranta@...>
Date: Sat Jan 31, 2009 6:03 am
Subject: WS testing (was Re: Re: Interesting Aalto reference, linux+jibx+aalto...)
cowtowncoder
Send Email Send Email
 
Hi Luis! One idea occured to me today: I don't know how easy it would
be to do, but looking at wstest home page, it might be doable,
depending on how tightly coupled it is with soap and/or xml.

Anyway: I don't know if you are familiar with json, or the fastest
java json processor, Jackson (http://jackson.codehaus.org). I wrote
Jackson based on my experiences on Woodstox and Aalto, so it is rather
fast as well. In fact, slightly faster than Aalto for most cases,
although that's mostly due to json/xml differences.

But more interesting than just json reading or writing, Jackson
package also implements full data-binding support (via ObjectMapper
class); essentially subset of JAXB functionality (and similar to
JibX).
Subset because there is no standard schema language for json, so
code-first approach is supported.
Object deserialization works like:

MyBean bean = new ObjectMapper().readValue(new StringReader("{
\"count\" : 1, \"name\" : \"jackson\" ]", MyBean.class);

and serialization similarly

new ObjectMapper().writeValue(new FileWriter("result.json"), anyValueObject);

(these are using convenience methods, there are full methods too that
allow using of JsonParser, JsonGenerator etc)

Given above, it might be quite easy to implement json-based web
service, where data binding is done using Jackson instead of JAXB. I
would expect such a service to be still faster than Jibx

What do you think?

-+ Tatu +-

#27 From: "lfs_neves" <lfs_neves@...>
Date: Mon Feb 2, 2009 4:56 pm
Subject: WS testing (was Re: Re: Interesting Aalto reference, linux+jibx+aalto...)
lfs_neves
Send Email Send Email
 
--- In aalto-xml-interest@yahoogroups.com, Tatu Saloranta
<tsaloranta@...> wrote:

> Given above, it might be quite easy to implement json-based web
> service, where data binding is done using Jackson instead of JAXB. I
> would expect such a service to be still faster than Jibx
>
> What do you think?

It sounds cool, I was thinking something similar but using Google
Protocol Buffers, I might as well test json.
It would make a nice comparison.

I will try to make something soon.

Regards.

--
Luis Neves

#28 From: Tatu Saloranta <tsaloranta@...>
Date: Mon Feb 2, 2009 5:53 pm
Subject: Re: WS testing (was Re: Re: Interesting Aalto reference, linux+jibx+aalto...)
cowtowncoder
Send Email Send Email
 
On Mon, Feb 2, 2009 at 8:56 AM, lfs_neves <lfs_neves@...> wrote:
> --- In aalto-xml-interest@yahoogroups.com, Tatu Saloranta
>
> <tsaloranta@...> wrote:
>
>> Given above, it might be quite easy to implement json-based web
>> service, where data binding is done using Jackson instead of JAXB. I
>> would expect such a service to be still faster than Jibx
>>
>> What do you think?
>
> It sounds cool, I was thinking something similar but using Google
> Protocol Buffers, I might as well test json.
> It would make a nice comparison.

Cool! I have tested PB earlier, and it's quite easy. The main
challenge was that it's a bit of apples & oranges, given how tightly
coupled PB is. Messages are not self-contained (without schema you
have little idea what data is about, since integer codes are used for
message types), and you can't really bind data to other objects,
AFAIK you must use objects PB generates.
That's ok as long as test framework doesn't have problems with it --
in my case it was bit problematic, but I was able to try it out by
refactoring code. Or you can wrap PB objects with beans, although
that's akin to writing a data binding lib of your own. :-)

But it would be very interesting to see how different formats & libs compare!
So let me know how things work.

-+ Tatu +-

#29 From: Tatu Saloranta <tsaloranta@...>
Date: Thu Feb 5, 2009 5:58 am
Subject: Version 0.9.4 released: now Aalto is a COMPLETE stax 1.0 implementation!
cowtowncoder
Send Email Send Email
 
After finishing the namespace-repairing mode for stream writers,
implementing coalescing mode, and ensuring that both pass 100% with
existing staxtest and stax2test unit test suites, it is time for one
of last pre-1.0 releases.

At this point, the main thing that remains to be wrapped up is the
non-blocking parser, which is mostly functional but lacks following:

(a) API extension/alternative over Stax, since Stax does not cover
non-blocking cases
   * constructing non-blocking parsers
   * feeding content (can't use input stream or reader, since they are blocking)
(b) Implementation of bootstrapping (auto-detection of encoding,
parsing of xml declaration).

and of course some documentation regarding non-blocking API.

But for blocking use cases (all existing Stax, stax2 use cases), Aalto
is getting rather ready for production use!

-+ Tatu +-

#30 From: "lfs_neves" <lfs_neves@...>
Date: Sat Feb 7, 2009 11:07 am
Subject: Re: Version 0.9.4 released: now Aalto is a COMPLETE stax 1.0 implementation!
lfs_neves
Send Email Send Email
 
--- In aalto-xml-interest@yahoogroups.com, Tatu Saloranta
<tsaloranta@...> wrote:
>
> After finishing the namespace-repairing mode for stream writers,
> implementing coalescing mode, and ensuring that both pass 100% with
> existing staxtest and stax2test unit test suites, it is time for one
> of last pre-1.0 releases.

I'm getting a 404:
/hatchery/aalto/0.9.4/aalto-gpl-0.9.4.jar was not found on this server.

--
Luis Neves

#31 From: Tatu Saloranta <tsaloranta@...>
Date: Sat Feb 7, 2009 5:21 pm
Subject: Re: Re: Version 0.9.4 released: now Aalto is a COMPLETE stax 1.0 implementation!
cowtowncoder
Send Email Send Email
 
My bad -- jars were copied one directory too high. Should work now,

-+ Tatu +-

On Sat, Feb 7, 2009 at 3:07 AM, lfs_neves <lfs_neves@...> wrote:
> --- In aalto-xml-interest@yahoogroups.com, Tatu Saloranta
>
> <tsaloranta@...> wrote:
>>
>> After finishing the namespace-repairing mode for stream writers,
>> implementing coalescing mode, and ensuring that both pass 100% with
>> existing staxtest and stax2test unit test suites, it is time for one
>> of last pre-1.0 releases.
>
> I'm getting a 404:
> /hatchery/aalto/0.9.4/aalto-gpl-0.9.4.jar was not found on this server.
>
> --
> Luis Neves
>
>

#32 From: Tatu Saloranta <tsaloranta@...>
Date: Sat Feb 7, 2009 5:25 pm
Subject: Quick note: Aalto rocks Xalan, Saxon (via SAX)
cowtowncoder
Send Email Send Email
 
Quick note: I am doing performance testing, to see how fast Aalto
works when used as SAX replacement for Xalan and Saxon.
This is easy to do: Aalto has JAXP factory and parser implementations
under "org.codehaus.wool.sax"; just construct a SAXParserFactoryImpl,
and go from there.

Initial results are very encouraging: compared to Xerces (2.9.1),
Woodstox is bit faster, but Aalto is similar bit faster than Woodstox.
Obviously there's more overhead with xslt processing than just xml
parsing, but I think 30-40% performance boost with a simple jar change
sounds pretty good to me.

I hope to publish these (and other) results in near future, but
thought I'll give a quick preview at this point; I think results
themselves are sound, just need to polish presentation aspects.
Plus, it should be easy to reproduce my findings too.

-+ Tatu +-

#33 From: "lfs_neves" <lfs_neves@...>
Date: Sun Feb 8, 2009 6:51 pm
Subject: WS testing (was Re: Re: Interesting Aalto reference, linux+jibx+aalto...)
lfs_neves
Send Email Send Email
 
--- In aalto-xml-interest@yahoogroups.com, Tatu Saloranta
<tsaloranta@...> wrote:
>
> Hi Luis! One idea occured to me today: I don't know how easy it would
> be to do, but looking at wstest home page, it might be doable,
> depending on how tightly coupled it is with soap and/or xml.

...

> Given above, it might be quite easy to implement json-based web
> service, where data binding is done using Jackson instead of JAXB. I
> would expect such a service to be still faster than Jibx

I've just posted my test results with JSON as an alternative
serialization mechanism using the Jackson Processor... yes it is fast:

http://technotes.blogs.sapo.pt/1708.html


I've had a small issue  in the process of porting the tests to JSON,
Jackson serialized 0.0f as "0.0" but was unable to deserialize it
back, it errors out with the message:
"java.lang.Float from String value '0.0': overflow/underflow, value
can not be represented as a 32-bit float"

Other than that it was painless. You've wrote another great parser!

Regards

--
Luis Neves

#34 From: Tatu Saloranta <tsaloranta@...>
Date: Mon Feb 9, 2009 4:47 am
Subject: Re: WS testing (was Re: Re: Interesting Aalto reference, linux+jibx+aalto...)
cowtowncoder
Send Email Send Email
 
On Sun, Feb 8, 2009 at 10:51 AM, lfs_neves <lfs_neves@...> wrote:
> --- In aalto-xml-interest@yahoogroups.com, Tatu Saloranta
> <tsaloranta@...> wrote:
>>
>> Hi Luis! One idea occured to me today: I don't know how easy it would
>> be to do, but looking at wstest home page, it might be doable,
>> depending on how tightly coupled it is with soap and/or xml.
>
> ...
>
>> Given above, it might be quite easy to implement json-based web
>> service, where data binding is done using Jackson instead of JAXB. I
>> would expect such a service to be still faster than Jibx
>
> I've just posted my test results with JSON as an alternative
> serialization mechanism using the Jackson Processor... yes it is fast:
>
> http://technotes.blogs.sapo.pt/1708.html

Great!

> I've had a small issue in the process of porting the tests to JSON,
> Jackson serialized 0.0f as "0.0" but was unable to deserialize it
> back, it errors out with the message:
> "java.lang.Float from String value '0.0': overflow/underflow, value
> can not be represented as a 32-bit float"

Ah, thanks, that sounds like a bug -- I think it may be due to my
misunderstanding some of constants in Float/Double classes (MIN_VALUE
is epsilon, and not negative number with highest absolute value).
I'll definitely need to fix this.

> Other than that it was painless. You've wrote another great parser!

Thank you.

-+ Tatu +-

#35 From: Tatu Saloranta <tsaloranta@...>
Date: Wed Mar 11, 2009 6:22 am
Subject: Aalto performance compared to Thrift, Protocol Buffers; using someone else's tests
cowtowncoder
Send Email Send Email
 
Ok, here's some more interesting benchmark data. Let's start with a
measurement done by someone not associated with Aalto project:

http://www.eishay.com/2008/11/protobuf-with-option-optimize-for-speed.html

Doesn't look too good for Stax? Well, I thought I'll figure out what's
going on. Turns out that:

(a) Stax implementation is the reference implementation (yuck)
(b) For each single serialization/deserialization, a new
XMLInput/OutputFactory is created via factory.newInstance(). OUCH!

Fixing these obvious flaws, starting by using Woodstox improves
reading speed by 8x and writing by 10x. Which brings stax-based
solution to about 40% of speed for reading, and almost 100% speed for
writing (binary formats tend to be relatively faster to read than
write).

But plug in Aalto and results (numbers are milliseconds) are:
---
using Aalto as Stax impl:

warming up...
Starting
  ,Object create, Serializaton, Deserialization, Serilized Size
thrift, 1304.13260, 23069.41900, 24145.53400, 314
protobuf, 2081.43830, 26319.83200, 15060.01900, 217
java, 973.97880, 75996.27200, 260578.72200, 845
scala, 655.33490, 118616.22600, 548926.90300, 1473
stax, 1003.50770, 17027.86700, 27728.39200, 406

---

And it turns out that for this (real world, I think) use case, Aalto

(a) is bit faster at writing data than either Thrift or Protocol
Buffers (17 ms vs 23 ms vs 26 ms)
(b) is bit slower at reading data (27 vs 24 vs 15)
(c) -> end-to-end, all 3 are about as fast (44 vs 47 vs 41)

(and this despite the fact that message size ratios are 400:300:200)

So, it appears that Aalto is pretty efficient at what it does. I mean,
Protocol Buffers is supposed to be, what, 10 - 100x faster than xml.
So Aalto must be 10x - 100x faster than format it deals with. Not too
shabby!

-+ Tatu +

#36 From: Tatu Saloranta <tsaloranta@...>
Date: Sat Mar 21, 2009 6:00 am
Subject: Minor release, 0.9.5, renamed packages -> com.fasterxml.aalto
cowtowncoder
Send Email Send Email
 
Quick note: I released 0.9.5, which has only one externally visible
change (in addition to some internal cleanup):
all code is now under package "com.fasterxml.aalto" (instead of older
'org.codehaus.wool', which was a leftover).
This does not necessarily require any changes to app code (since
services file uses the new factory class names), unless application
adds direct reference, or uses dependency injection to define
implementation.

In (hopefully) near future, I will move Aalto download pages to reside
under http://fasterxml.com as well, but for now they are still
available from cowtowncoder.com.

-+ Tatu +-

ps. I will try to get this:
  http://www.eishay.com/2009/03/more-on-benchmarking-java-serialization.html
  updated to also include Aalto (Woodstox and Jackson are included now)
since it should showcase Aalto performance. Esp. if I would also add
Fast Infoset...

#37 From: Tatu Saloranta <tsaloranta@...>
Date: Tue Mar 31, 2009 6:51 pm
Subject: Performance comparison, external
cowtowncoder
Send Email Send Email
 
(cc:ing to woodstox-users/dev, since it is related to Woodstox too...
as well as Aalto)

Apologies for posting this again, but I think that it's good to check out:

http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking

since there have been some changes, more things compared and so on.

So while it's not 100% clear which is fastest choice (Protocol
Buffers, Thrift or Json with Jackson), it's safe to say that
performance differences between these candidates are not huge. And
that's sort of amazing, considering how many things one has to give up
when using non-self-descriptive binary formats, without getting
order-of-magnitude faster performance. :)

Of course all the usual disclaimers apply: benchmarks are always
unfair, not relevant to all use cases and so on.
But at least here code is open source, methodology is simple,
repeatable, and implementations chosen are best-of-breed for data
formats.

-+ Tatu +-

#38 From: "plantfern" <fern@...>
Date: Tue Oct 13, 2009 11:30 pm
Subject: Status of Project??
plantfern
Send Email Send Email
 
Hi, I believe there is some interest in having an XML parser that supports NIO (
http://stackoverflow.com/questions/1045544/stax-parsing-from-java-nio-channel ).
And it looks like Aalto might be the only option at the moment, but it looks to
have stalled.

What is the current status of the project?  Any interest by developers?

#39 From: Tatu Saloranta <tsaloranta@...>
Date: Wed Oct 14, 2009 12:34 am
Subject: Re: Status of Project??
cowtowncoder
Send Email Send Email
 
Interesting -- hadn't seen that entry.
Current status is "waiting for interested users". :-)

Meaning that current functionality (core Stax 1.0 implementation; plus most of Stax2 extensions) is stable, usable and complete.
But next steps to take would be big (DTD support, hooking Stax2 validation API); except for fairly simple things to complete async API.

So... if there is interest for NIO part, we would be interested in working with others in this area.
One immediate thing to work on is just the API (how to feed content into parser, nothing fancy, how to check whether current state is acceptable parsing end state); and second one hooking up the rest to do the feeding.
Finally, there is one missing piece wrt parsing: handling of xml prolog. That is not a huge undertaking, but needs to be completed for real use (for now, one just has to strip out xml declaration to test async functionality).

Put another way: project is not dead, I have just been busy with other projects (mostly Jackson json parser).
Also, while adoption has been limited, there is at least one product now shipping with Aalto, so maybe it might be time to start "selling" Aalto bit more.

-+ Tatu +-

On Tue, Oct 13, 2009 at 4:30 PM, plantfern <fern@...> wrote:
 

Hi, I believe there is some interest in having an XML parser that supports NIO ( http://stackoverflow.com/questions/1045544/stax-parsing-from-java-nio-channel ). And it looks like Aalto might be the only option at the moment, but it looks to have stalled.

What is the current status of the project? Any interest by developers?



#40 From: Tatu Saloranta <tsaloranta@...>
Date: Thu Oct 15, 2009 5:43 am
Subject: Anyone interested in helping with non-blocking/async parsing use cases?
cowtowncoder
Send Email Send Email
 
Hi there! It has been a while since there's been significant progress
with Aalto -- mostly it's just because of other competing things going
on, but part of it has been due to:

(a) Core blocking (traditional) parser being feature complete, up to
complete Stax 1.0 compliancy, as well as Stax2 Typed Access API
implementation (there's still DTD handling to add, Stax2 Validation
API, but those are bigger undertakings)
(b) Apparent lack of interest for non-blocking parsing

But during past week I have had multiple contacts from developers who
would be interested in finding a non-blocking XML parser. Since Aalto
is almost there, I would be interested in completing minor missing
pieces.
To do that, what I really could use is a simple use case where to
plug-in such a component: ideally, a library, app or framework that is
accessing data using NIO (directly or via something like Netty). To
have something I could actually test with Aalto in non-blocking mode.
While I could write a toy test app that does not seem right -- it's
better to handle a real use case.

So... anyone with something I could use? Or willing to take to
collaborate on getting something like this done?

-+ Tatu +-

#41 From: "plantfern" <fern@...>
Date: Thu Oct 15, 2009 2:00 pm
Subject: Re: Anyone interested in helping with non-blocking/async parsing use cases?
plantfern
Send Email Send Email
 
It looks like the main motivator that I find is XMPP processing.  Since this is
based on an XML stream.  It's like an endless document, one root element (
<stream:stream> ), and lots of elements underneath that root element that carry
the communications.

So no DTD/Schema validation is required.  Because of the limitless nature, we
also need a way to emit DOM DocumentFragment(s) for each element, but not create
one huge document.

Currently the XMPP server I would like to enhance is Vysper based on the NIO
framework Mina.  They do some basic home-brewed XML parsing, but moving to a
standard one might be beneficial, but speed would be of importance too.

http://mina.apache.org/vysper/

What do you think?  I was just going to create an enhanced version of a SAX
parser for Mina.



--- In aalto-xml-interest@yahoogroups.com, Tatu Saloranta <tsaloranta@...>
wrote:
>
> Hi there! It has been a while since there's been significant progress
> with Aalto -- mostly it's just because of other competing things going
> on, but part of it has been due to:
>
> (a) Core blocking (traditional) parser being feature complete, up to
> complete Stax 1.0 compliancy, as well as Stax2 Typed Access API
> implementation (there's still DTD handling to add, Stax2 Validation
> API, but those are bigger undertakings)
> (b) Apparent lack of interest for non-blocking parsing
>
> But during past week I have had multiple contacts from developers who
> would be interested in finding a non-blocking XML parser. Since Aalto
> is almost there, I would be interested in completing minor missing
> pieces.
> To do that, what I really could use is a simple use case where to
> plug-in such a component: ideally, a library, app or framework that is
> accessing data using NIO (directly or via something like Netty). To
> have something I could actually test with Aalto in non-blocking mode.
> While I could write a toy test app that does not seem right -- it's
> better to handle a real use case.
>
> So... anyone with something I could use? Or willing to take to
> collaborate on getting something like this done?
>
> -+ Tatu +-
>

#42 From: Tatu Saloranta <tsaloranta@...>
Date: Fri Oct 16, 2009 12:25 am
Subject: Re: Re: Anyone interested in helping with non-blocking/async parsing use cases?
cowtowncoder
Send Email Send Email
 
On Thu, Oct 15, 2009 at 7:00 AM, plantfern <fern@...> wrote:
>
>
>
> It looks like the main motivator that I find is XMPP processing. Since this is
based on an XML stream. It's like an endless document, one root element (
<stream:stream> ), and lots of elements underneath that root element that carry
the communications.

Makes sense as far as use cases go.

> So no DTD/Schema validation is required. Because of the limitless nature, we
also need a way to emit DOM DocumentFragment(s) for each element, but not create
one huge document.

Ok.

> Currently the XMPP server I would like to enhance is Vysper based on the NIO
framework Mina. They do some basic home-brewed XML parsing, but moving to a
standard one might be beneficial, but speed would be of importance too.

Yeah -- and Aalto is very heavily optimized for speed; much of parser
code is shared between blocking and non-blocking parts (and rest was
branched fairly recently).

> http://mina.apache.org/vysper/
>
> What do you think? I was just going to create an enhanced version of a SAX
parser for Mina.

Let me have a look, sounds interesting so far. Writing XML parsers is
not trivial task, although doing it for specific use case of course
helps. But to get non-blocking part right, it get quite tricky to
handle anything from characters entities to decoding UTF-8 multi-byte
characters. Aalto does implement SAX too, as well as Stax; writing a
SAX parser is slightly easier than Stax, but fundamentally needing to
have "block at any given byte" ability is the trickiest thing.

-+ Tatu +-

#43 From: "plantfern" <fern@...>
Date: Fri Oct 16, 2009 12:37 am
Subject: Re: Anyone interested in helping with non-blocking/async parsing use cases?
plantfern
Send Email Send Email
 
The only sticking point that the people at Mina-Vysper mailing list brought up
is the licensing.  For Mina-Vysper to be able to use Aalto, the license needs to
be Apache compatible.. which I guess GPL and/or commercial licenses are not :(

What do you think of that?

--- In aalto-xml-interest@yahoogroups.com, Tatu Saloranta <tsaloranta@...>
wrote:
>
> On Thu, Oct 15, 2009 at 7:00 AM, plantfern <fern@...> wrote:
> >
> >
> >
> > It looks like the main motivator that I find is XMPP processing. Since this
is based on an XML stream. It's like an endless document, one root element (
<stream:stream> ), and lots of elements underneath that root element that carry
the communications.
>
> Makes sense as far as use cases go.
>
> > So no DTD/Schema validation is required. Because of the limitless nature, we
also need a way to emit DOM DocumentFragment(s) for each element, but not create
one huge document.
>
> Ok.
>
> > Currently the XMPP server I would like to enhance is Vysper based on the NIO
framework Mina. They do some basic home-brewed XML parsing, but moving to a
standard one might be beneficial, but speed would be of importance too.
>
> Yeah -- and Aalto is very heavily optimized for speed; much of parser
> code is shared between blocking and non-blocking parts (and rest was
> branched fairly recently).
>
> > http://mina.apache.org/vysper/
> >
> > What do you think? I was just going to create an enhanced version of a SAX
parser for Mina.
>
> Let me have a look, sounds interesting so far. Writing XML parsers is
> not trivial task, although doing it for specific use case of course
> helps. But to get non-blocking part right, it get quite tricky to
> handle anything from characters entities to decoding UTF-8 multi-byte
> characters. Aalto does implement SAX too, as well as Stax; writing a
> SAX parser is slightly easier than Stax, but fundamentally needing to
> have "block at any given byte" ability is the trickiest thing.
>
> -+ Tatu +-
>

#44 From: Fernando Padilla <fern@...>
Date: Fri Oct 16, 2009 12:37 am
Subject: Re: Re: Anyone interested in helping with non-blocking/async parsing use cases?
plantfern
Send Email Send Email
 
test

On 10/15/09 5:37 PM, plantfern wrote:
 

The only sticking point that the people at Mina-Vysper mailing list brought up is the licensing. For Mina-Vysper to be able to use Aalto, the license needs to be Apache compatible.. which I guess GPL and/or commercial licenses are not :(

What do you think of that?

--- In aalto-xml-interest@yahoogroups.com, Tatu Saloranta <tsaloranta@...> wrote:
>
> On Thu, Oct 15, 2009 at 7:00 AM, plantfern <fern@...> wrote:
> >
> >
> >
> > It looks like the main motivator that I find is XMPP processing. Since this is based on an XML stream. It's like an endless document, one root element ( <stream:stream> ), and lots of elements underneath that root element that carry the communications.
>
> Makes sense as far as use cases go.
>
> > So no DTD/Schema validation is required. Because of the limitless nature, we also need a way to emit DOM DocumentFragment(s) for each element, but not create one huge document.
>
> Ok.
>
> > Currently the XMPP server I would like to enhance is Vysper based on the NIO framework Mina. They do some basic home-brewed XML parsing, but moving to a standard one might be beneficial, but speed would be of importance too.
>
> Yeah -- and Aalto is very heavily optimized for speed; much of parser
> code is shared between blocking and non-blocking parts (and rest was
> branched fairly recently).
>
> > http://mina.apache.org/vysper/
> >
> > What do you think? I was just going to create an enhanced version of a SAX parser for Mina.
>
> Let me have a look, sounds interesting so far. Writing XML parsers is
> not trivial task, although doing it for specific use case of course
> helps. But to get non-blocking part right, it get quite tricky to
> handle anything from characters entities to decoding UTF-8 multi-byte
> characters. Aalto does implement SAX too, as well as Stax; writing a
> SAX parser is slightly easier than Stax, but fundamentally needing to
> have "block at any given byte" ability is the trickiest thing.
>
> -+ Tatu +-
>


#45 From: Tatu Saloranta <tsaloranta@...>
Date: Fri Oct 16, 2009 6:43 am
Subject: Re: Re: Anyone interested in helping with non-blocking/async parsing use cases?
cowtowncoder
Send Email Send Email
 
On Thu, Oct 15, 2009 at 5:37 PM, plantfern <fern@...> wrote:
>
> The only sticking point that the people at Mina-Vysper mailing list brought up
is the licensing. For Mina-Vysper to be able to use Aalto, the license needs to
be Apache compatible.. which I guess GPL and/or commercial licenses are not :(
>
> What do you think of that?

That could be problematic, yes, knowing the state of affairs between
GPL and Apache camps. :-)

I'll have to think a bit about that: GPL happens to be reasonable
match with commercial licensing (to divide usage into free and
non-free cases), but it has its downside too. So it may be time to
revise licensing question.

-+ Tatu +-

#46 From: "plantfern" <fern@...>
Date: Fri Oct 16, 2009 6:19 pm
Subject: Re: Anyone interested in helping with non-blocking/async parsing use cases?
plantfern
Send Email Send Email
 
Well, if there is serious discussion about changing the license you can try
joining the Mina mailing list, this might be a pretty good thing they might get
excited about. :)


--- In aalto-xml-interest@yahoogroups.com, Tatu Saloranta <tsaloranta@...>
wrote:
>
> On Thu, Oct 15, 2009 at 5:37 PM, plantfern <fern@...> wrote:
> >
> > The only sticking point that the people at Mina-Vysper mailing list brought
up is the licensing. For Mina-Vysper to be able to use Aalto, the license needs
to be Apache compatible.. which I guess GPL and/or commercial licenses are not
:(
> >
> > What do you think of that?
>
> That could be problematic, yes, knowing the state of affairs between
> GPL and Apache camps. :-)
>
> I'll have to think a bit about that: GPL happens to be reasonable
> match with commercial licensing (to divide usage into free and
> non-free cases), but it has its downside too. So it may be time to
> revise licensing question.
>
> -+ Tatu +-
>

#47 From: Tatu Saloranta <tsaloranta@...>
Date: Mon Oct 19, 2009 6:09 pm
Subject: Re: Re: Anyone interested in helping with non-blocking/async parsing use cases?
cowtowncoder
Send Email Send Email
 
On Fri, Oct 16, 2009 at 11:19 AM, plantfern <fern@...> wrote:
>
> Well, if there is serious discussion about changing the license you can try
joining the Mina mailing list, this might be a
> pretty good thing they might get excited about. :)

That's bit of chicken-and-egg problem. :)
(and impedance between selling a solution vs. having people with a
problem find you)

But I could definitely join the list. How is Vysper related to Mina
(or is it)? I do remember Mina, has been around for a while.

-+ Tatu +-

#48 From: "pnehrers2" <pnehrer@...>
Date: Sat Oct 24, 2009 2:51 am
Subject: Re: Anyone interested in helping with non-blocking/async parsing use cases?
pnehrers2
Send Email Send Email
 
I hereby declare my interest in an asynchronous xml parser solution :-)

I'm looking for a way to efficiently process xml input in a Netty-based server.
Right now, I have to spin another thread and let it block on reading an input
stream while feeding it with bytes every time I get a new buffer-ful from the
socket... quite ugly.

The problem with most parsers I investigated is that they are not written in a
way that would allow me to interrupt parsing (and essentially go back to the
last well-defined state) when no more bytes are available. This is basically how
Netty does packet defragmenting -- if you don't have enough bytes to move to the
next state, you stay in the current state and retry when you get more bytes.
There's probably a more efficient way -- like keeping a set of possible states
you can be in at any given byte, but that sounds more complicated.

If I had a StAX parser that would fail next() when it can't complete parsing
with currently available bytes and then let me retry the same next() again
later, I'd be all set :-)

Anyway, I'm not sure how I could help, but I'm interested (also, an
Apache-compatible license wouldn't hurt ;-))

--Peter

#49 From: Tatu Saloranta <tsaloranta@...>
Date: Sat Oct 24, 2009 3:28 am
Subject: Re: Re: Anyone interested in helping with non-blocking/async parsing use cases?
cowtowncoder
Send Email Send Email
 
On Fri, Oct 23, 2009 at 7:51 PM, pnehrers2 <pnehrer@...> wrote:
>
> I hereby declare my interest in an asynchronous xml parser solution :-)

Great! It seems that there are couple of other developers seriously
interested, so I think I better get back on track with development. :)

> I'm looking for a way to efficiently process xml input in a Netty-based
server. Right now, I have to spin another thread and let it block on reading an
input stream while feeding it with bytes every time I get a new buffer-ful from
the socket... quite ugly.

Yup.

> The problem with most parsers I investigated is that they are not written in a
way that would allow me to interrupt parsing (and essentially go back to the
last well-defined state) when no more bytes are available.
> This is basically how Netty does packet defragmenting -- if you don't have
enough bytes to move to the next state, you stay in the current state and retry
when you get more bytes. There's probably a more
> efficient way -- like keeping a set of possible states you can be in at any
given byte, but that sounds more complicated.

Yes indeed. Aalto's non-blocking mode is implemented to allow exactly
this: I have tested it with passing exactly one byte at a time,
ensuring it doesn't get confused. Implementing parsing is quite a bit
harder, but once you are done, it's rather neat; and speed for regular
bigger chunks is not much worse than using blocking regular IO (why
potentially slower? because of extra book keeping to retain state,
allow having to bail out)

> If I had a StAX parser that would fail next() when it can't complete parsing
with currently available bytes and then let me retry the same next() again
later, I'd be all set :-)

The way I am thinking of doing this would be to return something like
XMLStreamConstants.NOT_YET_AVAILABLE, so that should work.

In addition Aalto will implement SAX interface as well, building on Stax core.

> Anyway, I'm not sure how I could help, but I'm interested (also, an
Apache-compatible license wouldn't hurt ;-))

If and when we get things going, license issue will be resolved.
Initially licensing should only matter for anyone who has to be
distribute aalto artifacts -- GPL does not bind end users. But we
realize that there are concerns regarding GPL.
There is just the question of how to make it possible for FasterXML to
offer compelling business case for companies to use commercial
license.

As to helping, what I could really use are really just app/server
skeletons, in which to plug parser. I can prototype API, publish it,
let others play with it. I am just not good at writing "toy apps" --
right now I can work with blocking version, so I generally do that.
But with 'real' use case (something someone else has written :) ) it
is easier for me to get started on cleaning up non-blocking part.

Does this make sense?

-+ Tatu +-

Messages 20 - 49 of 75   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help