Search the web
Sign In
New User? Sign Up
xmlpull-user
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
XML Pull Parsers for a of compressed file containing a set of XML f   Message List  
Reply | Forward Message #279 of 308 |
Hi,
I have a requirement wherein I have to process a compressed file
containing multiple xml files (lets say AllXMLs.tar.gz)....I have to
parse all the XML files without uncompressing them.
I can use the classes available in java.util.zip package to create
InputStream for the contents of the XML files..But can the XML Pull
parser process it...??

Currently when I am passing the Input Stream to the Parser, I get
the following exception:
Exception in thread "main" org.gjt.xpp.XmlPullParserException: only
whitespace content allowed outside root element at
ine 2 and column 3 seen "...<?xml version="1.0" encoding="ISO-8859-
1"?>\n"... (parser state CONTENT)
at org.gjt.xpp.impl.pullparser.PullParser.next
(PullParser.java:429)
at ZipParser.readGZIPFile(ZipParser.java:62)
at ZipParser.main(ZipParser.java:17)


My function looks something like this:
private static void readGZIPFile(String fileName) throws Exception
{
// use BufferedReader to get one line at a time
BufferedReader gzipReader = null;
XmlPullParserFactory factory =
XmlPullParserFactory.newInstance();
XmlPullParser pp = factory.newPullParser();
pp.setAllowedMixedContent(true);

try
{
// simple loop to dump the contents to the
console
gzipReader = new BufferedReader( new
InputStreamReader( new GZIPInputStream(new FileInputStream
(fileName))));
while (gzipReader.ready())
{
pp.setInput(gzipReader);

// input could be also taken from
String directly:
//pp.setInput(data.toCharArray());

// 4. parsing

//declare variables used during
parsing
XmlStartTag stag =
factory.newStartTag();
XmlEndTag etag = factory.newEndTag();


byte type; // received event type
byte prevType; // previous event
type

type = prevType = pp.next();
if(type == XmlPullParser.START_TAG) {
pp.readStartTag(stag);
//System.err.println("read
start tag "+stag);
if(! "test".equals
(stag.getLocalName())) {
throw new
RuntimeException("bulk data must start with test not "

+stag.getLocalName()
+pp.getPosDesc());
}
} else {
throw new RuntimeException
("unexpected end of data "+pp.getPosDesc());
}

// start parsing loop
for(;;) {
type = pp.next();
if(type ==
XmlPullParser.START_TAG) {
pp.readStartTag
(stag);
//System.err.println
("read start tag "+stag);
type = pp.next();
String content = "";
if(type ==
XmlPullParser.CONTENT) {
content =
pp.readContent();

//System.err.println("read content="+content);
while(type !
= XmlPullParser.END_TAG) {
try {

type = pp.next();
}
catch(Exception e){

System.err.println("ERROR recovering from "+e);

// give it a second chance

//type = pp.next();

type = pp.getEventType();
}
}
}
if(type !=
XmlPullParser.END_TAG) {
throw new
RuntimeException("expected end tag not "+pp.getPosDesc());
}
System.err.println
("LOAD tag="+stag.getLocalName()+" data='"+content+"'");
} else if(type ==
XmlPullParser.END_TAG) {
break;
} else if(type ==
XmlPullParser.END_DOCUMENT) {
throw new
RuntimeException("unexpected end of data "+pp.getPosDesc());
} else {
throw new
RuntimeException("unknown event type: "+type);
}
}
}
gzipReader.close();
}
catch (FileNotFoundException fnfe)
{
System.out.println("The file was not
found: " + fnfe.getMessage());
}
catch (IOException ioe)
{
System.out.println("An IOException
occurred: " + ioe.getMessage());
}
finally
{
if (gzipReader != null)
{
try
{
gzipReader.close();
}
catch (IOException ioe)
{
}
}
}
}

Can you please advise





Tue Oct 17, 2006 4:17 am

ndnair1979
Offline Offline
Send Email Send Email

Forward
Message #279 of 308 |
Expand Messages Author Sort by Date

Hi, I have a requirement wherein I have to process a compressed file containing multiple xml files (lets say AllXMLs.tar.gz)....I have to parse all the XML...
Nitin
ndnair1979
Offline Send Email
Oct 17, 2006
4:57 am

... that would indicate that there is something <?xml version="1.0" encoding="ISO-8859-1"?> in your input after decompressin - if it is tar.gz it is not enough...
Aleksander Slominski
as10m
Offline Send Email
Oct 17, 2006
5:33 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help