Nitin wrote:
> Hi,
> I have a requirement wherein I have to process a compressed file
> containing multiple xml files (lets say AllXMLs.tar.gz)....I have to
> parse all the XML files without uncompressing them.
> I can use the classes available in java.util.zip package to create
> InputStream for the contents of the XML files..But can the XML Pull
> parser process it...??
>
> Currently when I am passing the Input Stream to the Parser, I get
> the following exception:
> Exception in thread "main" org.gjt.xpp.XmlPullParserException: only
> whitespace content allowed outside root element at
> ine 2 and column 3 seen "...<?xml version="1.0" encoding="ISO-8859-
> 1"?>\n"...
that would indicate that there is something <?xml version="1.0"
encoding="ISO-8859-1"?> in your input after decompressin - if it is
tar.gz it is not enough to decompress you need also to extract each file
from tar archive.
to verify try to gzip an XML file and then parse - i bet it would work ;-)
best,
alek
> (parser state CONTENT)
> at org.gjt.xpp.impl.pullparser.PullParser.next
> (PullParser.java:429)
> at ZipParser.readGZIPFile(ZipParser.java:62)
> at ZipParser.main(ZipParser.java:17)
>
>
> My function looks something like this:
> private static void readGZIPFile(String fileName) throws Exception
> {
> // use BufferedReader to get one line at a time
> BufferedReader gzipReader = null;
> XmlPullParserFactory factory =
> XmlPullParserFactory.newInstance();
> XmlPullParser pp = factory.newPullParser();
> pp.setAllowedMixedContent(true);
>
> try
> {
> // simple loop to dump the contents to the
> console
> gzipReader = new BufferedReader( new
> InputStreamReader( new GZIPInputStream(new FileInputStream
> (fileName))));
> while (gzipReader.ready())
> {
> pp.setInput(gzipReader);
>
> // input could be also taken from
> String directly:
> //pp.setInput(data.toCharArray());
>
> // 4. parsing
>
> //declare variables used during
> parsing
> XmlStartTag stag =
> factory.newStartTag();
> XmlEndTag etag = factory.newEndTag();
>
>
> byte type; // received event type
> byte prevType; // previous event
> type
>
> type = prevType = pp.next();
> if(type == XmlPullParser.START_TAG) {
> pp.readStartTag(stag);
> //System.err.println("read
> start tag "+stag);
> if(! "test".equals
> (stag.getLocalName())) {
> throw new
> RuntimeException("bulk data must start with test not "
>
> +stag.getLocalName()
> +pp.getPosDesc());
> }
> } else {
> throw new RuntimeException
> ("unexpected end of data "+pp.getPosDesc());
> }
>
> // start parsing loop
> for(;;) {
> type = pp.next();
> if(type ==
> XmlPullParser.START_TAG) {
> pp.readStartTag
> (stag);
> //System.err.println
> ("read start tag "+stag);
> type = pp.next();
> String content = "";
> if(type ==
> XmlPullParser.CONTENT) {
> content =
> pp.readContent();
>
> //System.err.println("read content="+content);
> while(type !
> = XmlPullParser.END_TAG) {
> try {
>
> type = pp.next();
> }
> catch(Exception e){
>
> System.err.println("ERROR recovering from "+e);
>
> // give it a second chance
>
> //type = pp.next();
>
> type = pp.getEventType();
> }
> }
> }
> if(type !=
> XmlPullParser.END_TAG) {
> throw new
> RuntimeException("expected end tag not "+pp.getPosDesc());
> }
> System.err.println
> ("LOAD tag="+stag.getLocalName()+" data='"+content+"'");
> } else if(type ==
> XmlPullParser.END_TAG) {
> break;
> } else if(type ==
> XmlPullParser.END_DOCUMENT) {
> throw new
> RuntimeException("unexpected end of data "+pp.getPosDesc());
> } else {
> throw new
> RuntimeException("unknown event type: "+type);
> }
> }
> }
> gzipReader.close();
> }
> catch (FileNotFoundException fnfe)
> {
> System.out.println("The file was not
> found: " + fnfe.getMessage());
> }
> catch (IOException ioe)
> {
> System.out.println("An IOException
> occurred: " + ioe.getMessage());
> }
> finally
> {
> if (gzipReader != null)
> {
> try
> {
> gzipReader.close();
> }
> catch (IOException ioe)
> {
> }
> }
> }
> }
>
> Can you please advise
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
--
The best way to predict the future is to invent it - Alan Kay