Search the web
Sign In
New User? Sign Up
aalto-xml-interest · Aalto XML Parser (stax)
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Implemented namespace-repairing mode, then need coalescing mode, for   Message List  
Reply | Forward Message #25 of 61 |
Ok, of 2 missing Stax 1.0 features, one is now fully implemented. The
next release (0.9.4) will contain fully functioning
namespace-repairing mode for XMLStreamWriter; it passes all staxtest
and stax2test test cases.

The next immediate task will be implementing the last major feature,
coalescing mode. After this, 1.0 could be finalized.

However, I think it would also make sense to add one more task for
1.0: formalize API to use for feeding non-blocking variant of
XMLStreamReader. Unlike blocking readers that can take in InputStream
or Reader (and references that can be used to create these),
non-blocking reader will not read any of its input. Rather, calling
app has to feed it new chunks of content once parser is done with the
current chunk.

Currently non-blocking parser prototype works as follows:

---
InputStream in = new FileInputStream(file); // just to
generate the input, usually would be NIO-based
final byte[] buf = new byte[3000];

ReaderConfig cfg = new ReaderConfig();
cfg.setActualEncoding("UTF-8"); // no encoding auto-detect yet
(will be added)

// will need a factory, can't use XMLInputFactory as is
AsyncUtfScanner asc = new AsyncUtfScanner(cfg);
StreamReaderImpl sr = new StreamReaderImpl(asc);

while (true) {
int type;

// We will feed chunked input 3 bytes at a time, for
test/demo purposes (even one byte would work)
while ((type = sr.next()) == AsyncByteScanner.EVENT_INCOMPLETE) {
int len = in.read(buf, 1, 3);
if (len < 0) { // shouldn't happen in the middle of
partial token
System.err.println("Error: Unexpected EOF");
break main_loop;
}
asc.addInput(buf, 1, len);
}
if (type == END_DOCUMENT) { // to trigger this, caller
must signal actual end of input
break;
}
// otherwise, handle the token; all data is available
without blocking
}

---

which clearly is not ready for production use, wires sticking out the
rat's nest kinda box. :-)

But the basic idea is simple: caller needs to handle EVENT_INCOMPLETE
return type, feed more data, indicate end of input when appropriate
(which may throw an exception etc), but otherwise work normally.
Once non-incomplete event is returned, all data associated will be
available without blocking.
Memory usage will be bounded by amount of memory needed for the single
event (and some state for nesting), and specifically length of
individual text segments will be limited to chunk size that
application gives. That is, CHARACTERS/CDATA is returned as soon as at
least one character has been decoded (and up to contents of the whole
chunk passed).

Using such a non-blocking parser, it should be quite easy to build a
single-threaded (or, N-threaded for N cores/CPUs) xml input handling
server; and one that would perform nicely and could apply elaborate
throttling if need be.

One more thing that would be good to investigate is how easy it would
be to implement SAX API for non-blocking stream reader. That should
not be very hard -- blocking stream reader can already be used as a
SAX parser via JAXP (or directly).

Thoughts, comments, suggestions?

-+ Tatu +-



Thu Jan 29, 2009 5:58 pm

cowtowncoder
Offline Offline
Send Email Send Email

Forward
Message #25 of 61 |
Expand Messages Author Sort by Date

Ok, of 2 missing Stax 1.0 features, one is now fully implemented. The next release (0.9.4) will contain fully functioning namespace-repairing mode for...
Tatu Saloranta
cowtowncoder
Offline Send Email
Jan 29, 2009
5:58 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help