Hi,
I have a requirement wherein I have to process a compressed file
containing multiple xml files (lets say AllXMLs.tar.gz)....I have to
parse all the XML files without uncompressing them.
I can use the classes available in java.util.zip package to create
InputStream for the contents of the XML files..But can the XML Pull
parser process it...??
Currently when I am passing the Input Stream to the Parser, I get
the following exception:
Exception in thread "main" org.gjt.xpp.XmlPullParserException: only
whitespace content allowed outside root element at
ine 2 and column 3 seen "...<?xml version="1.0" encoding="ISO-8859-
1"?>\n"... (parser state CONTENT)
at org.gjt.xpp.impl.pullparser.PullParser.next
(PullParser.java:429)
at ZipParser.readGZIPFile(ZipParser.java:62)
at ZipParser.main(ZipParser.java:17)
My function looks something like this:
private static void readGZIPFile(String fileName) throws Exception
{
// use BufferedReader to get one line at a time
BufferedReader gzipReader = null;
XmlPullParserFactory factory =
XmlPullParserFactory.newInstance();
XmlPullParser pp = factory.newPullParser();
pp.setAllowedMixedContent(true);
try
{
// simple loop to dump the contents to the
console
gzipReader = new BufferedReader( new
InputStreamReader( new GZIPInputStream(new FileInputStream
(fileName))));
while (gzipReader.ready())
{
pp.setInput(gzipReader);
// input could be also taken from
String directly:
//pp.setInput(data.toCharArray());
// 4. parsing
//declare variables used during
parsing
XmlStartTag stag =
factory.newStartTag();
XmlEndTag etag = factory.newEndTag();
byte type; // received event type
byte prevType; // previous event
type
type = prevType = pp.next();
if(type == XmlPullParser.START_TAG) {
pp.readStartTag(stag);
//System.err.println("read
start tag "+stag);
if(! "test".equals
(stag.getLocalName())) {
throw new
RuntimeException("bulk data must start with test not "
+stag.getLocalName()
+pp.getPosDesc());
}
} else {
throw new RuntimeException
("unexpected end of data "+pp.getPosDesc());
}
// start parsing loop
for(;;) {
type = pp.next();
if(type ==
XmlPullParser.START_TAG) {
pp.readStartTag
(stag);
//System.err.println
("read start tag "+stag);
type = pp.next();
String content = "";
if(type ==
XmlPullParser.CONTENT) {
content =
pp.readContent();
//System.err.println("read content="+content);
while(type !
= XmlPullParser.END_TAG) {
try {
type = pp.next();
}
catch(Exception e){
System.err.println("ERROR recovering from "+e);
// give it a second chance
//type = pp.next();
type = pp.getEventType();
}
}
}
if(type !=
XmlPullParser.END_TAG) {
throw new
RuntimeException("expected end tag not "+pp.getPosDesc());
}
System.err.println
("LOAD tag="+stag.getLocalName()+" data='"+content+"'");
} else if(type ==
XmlPullParser.END_TAG) {
break;
} else if(type ==
XmlPullParser.END_DOCUMENT) {
throw new
RuntimeException("unexpected end of data "+pp.getPosDesc());
} else {
throw new
RuntimeException("unknown event type: "+type);
}
}
}
gzipReader.close();
}
catch (FileNotFoundException fnfe)
{
System.out.println("The file was not
found: " + fnfe.getMessage());
}
catch (IOException ioe)
{
System.out.println("An IOException
occurred: " + ioe.getMessage());
}
finally
{
if (gzipReader != null)
{
try
{
gzipReader.close();
}
catch (IOException ioe)
{
}
}
}
}
Can you please advise