Search the web
Sign In
New User? Sign Up
xmlpull-user
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
XML Pull Parsers for a of compressed file containing a set of XML f   Message List  
Reply | Forward Message #280 of 308 |
Re: [xmlpull-user] XML Pull Parsers for a of compressed file containing a set of XML files

Nitin wrote:
> Hi,
> I have a requirement wherein I have to process a compressed file
> containing multiple xml files (lets say AllXMLs.tar.gz)....I have to
> parse all the XML files without uncompressing them.
> I can use the classes available in java.util.zip package to create
> InputStream for the contents of the XML files..But can the XML Pull
> parser process it...??
>
> Currently when I am passing the Input Stream to the Parser, I get
> the following exception:
> Exception in thread "main" org.gjt.xpp.XmlPullParserException: only
> whitespace content allowed outside root element at
> ine 2 and column 3 seen "...<?xml version="1.0" encoding="ISO-8859-
> 1"?>\n"...
that would indicate that there is something <?xml version="1.0"
encoding="ISO-8859-1"?> in your input after decompressin - if it is
tar.gz it is not enough to decompress you need also to extract each file
from tar archive.

to verify try to gzip an XML file and then parse - i bet it would work ;-)

best,

alek
> (parser state CONTENT)
> at org.gjt.xpp.impl.pullparser.PullParser.next
> (PullParser.java:429)
> at ZipParser.readGZIPFile(ZipParser.java:62)
> at ZipParser.main(ZipParser.java:17)
>
>
> My function looks something like this:
> private static void readGZIPFile(String fileName) throws Exception
> {
> // use BufferedReader to get one line at a time
> BufferedReader gzipReader = null;
> XmlPullParserFactory factory =
> XmlPullParserFactory.newInstance();
> XmlPullParser pp = factory.newPullParser();
> pp.setAllowedMixedContent(true);
>
> try
> {
> // simple loop to dump the contents to the
> console
> gzipReader = new BufferedReader( new
> InputStreamReader( new GZIPInputStream(new FileInputStream
> (fileName))));
> while (gzipReader.ready())
> {
> pp.setInput(gzipReader);
>
> // input could be also taken from
> String directly:
> //pp.setInput(data.toCharArray());
>
> // 4. parsing
>
> //declare variables used during
> parsing
> XmlStartTag stag =
> factory.newStartTag();
> XmlEndTag etag = factory.newEndTag();
>
>
> byte type; // received event type
> byte prevType; // previous event
> type
>
> type = prevType = pp.next();
> if(type == XmlPullParser.START_TAG) {
> pp.readStartTag(stag);
> //System.err.println("read
> start tag "+stag);
> if(! "test".equals
> (stag.getLocalName())) {
> throw new
> RuntimeException("bulk data must start with test not "
>
> +stag.getLocalName()
> +pp.getPosDesc());
> }
> } else {
> throw new RuntimeException
> ("unexpected end of data "+pp.getPosDesc());
> }
>
> // start parsing loop
> for(;;) {
> type = pp.next();
> if(type ==
> XmlPullParser.START_TAG) {
> pp.readStartTag
> (stag);
> //System.err.println
> ("read start tag "+stag);
> type = pp.next();
> String content = "";
> if(type ==
> XmlPullParser.CONTENT) {
> content =
> pp.readContent();
>
> //System.err.println("read content="+content);
> while(type !
> = XmlPullParser.END_TAG) {
> try {
>
> type = pp.next();
> }
> catch(Exception e){
>
> System.err.println("ERROR recovering from "+e);
>
> // give it a second chance
>
> //type = pp.next();
>
> type = pp.getEventType();
> }
> }
> }
> if(type !=
> XmlPullParser.END_TAG) {
> throw new
> RuntimeException("expected end tag not "+pp.getPosDesc());
> }
> System.err.println
> ("LOAD tag="+stag.getLocalName()+" data='"+content+"'");
> } else if(type ==
> XmlPullParser.END_TAG) {
> break;
> } else if(type ==
> XmlPullParser.END_DOCUMENT) {
> throw new
> RuntimeException("unexpected end of data "+pp.getPosDesc());
> } else {
> throw new
> RuntimeException("unknown event type: "+type);
> }
> }
> }
> gzipReader.close();
> }
> catch (FileNotFoundException fnfe)
> {
> System.out.println("The file was not
> found: " + fnfe.getMessage());
> }
> catch (IOException ioe)
> {
> System.out.println("An IOException
> occurred: " + ioe.getMessage());
> }
> finally
> {
> if (gzipReader != null)
> {
> try
> {
> gzipReader.close();
> }
> catch (IOException ioe)
> {
> }
> }
> }
> }
>
> Can you please advise
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>


--
The best way to predict the future is to invent it - Alan Kay




Tue Oct 17, 2006 5:32 am

as10m
Offline Offline
Send Email Send Email

Forward
Message #280 of 308 |
Expand Messages Author Sort by Date

Hi, I have a requirement wherein I have to process a compressed file containing multiple xml files (lets say AllXMLs.tar.gz)....I have to parse all the XML...
Nitin
ndnair1979
Offline Send Email
Oct 17, 2006
4:57 am

... that would indicate that there is something <?xml version="1.0" encoding="ISO-8859-1"?> in your input after decompressin - if it is tar.gz it is not enough...
Aleksander Slominski
as10m
Offline Send Email
Oct 17, 2006
5:33 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help