Hey All,
I have a problem parsing websites with GRDDL.
I use this code:
Model m = ModelFactory.createDefaultModel();
RDFReader r = m.getReader("GRDDL");
r.setProperty("grddl.rdfa", true);
r.read(m, "some website...");
And get the exception pasted at the end.
Most of the times when I download the html file locally and delete the first
line about DOCTYPE it works just fine. But how can I overcome this?
Thanks,
Stijn
----
Exception output:
null
ERROR [main] (RDFDefaultErrorHandler.java:40) - java.io.IOException: Server
returned HTTP response code: 503 for URL:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
ERROR [main] (RDFDefaultErrorHandler.java:40) - java.io.IOException: Server
returned HTTP response code: 503 for URL:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
Exception in thread "main" net.sf.saxon.trans.DynamicError: java.io.IOException:
Server returned HTTP response code: 503 for URL:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:313)
at net.sf.saxon.event.Sender.send(Sender.java:142)
at net.sf.saxon.IdentityTransformer.transform(IdentityTransformer.java:29)
at com.hp.hpl.jena.grddl.impl.GRDDL.initialParse(GRDDL.java:234)
at com.hp.hpl.jena.grddl.impl.GRDDL.go(GRDDL.java:199)
at com.hp.hpl.jena.grddl.GRDDLReader.read(GRDDLReader.java:47)
at sites.ParseSites.getModel(ParseSites.java:32)
at sites.ParseSites.queryWebSite(ParseSites.java:39)
at sites.ParseSites.main(ParseSites.java:20)
Caused by: java.io.IOException: Server returned HTTP response code: 503 for URL:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.jav\
a:1313)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:300)
... 8 more
---------
java.io.IOException: Server returned HTTP response code: 503 for URL:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.jav\
a:1313)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:300)
at net.sf.saxon.event.Sender.send(Sender.java:142)
at net.sf.saxon.IdentityTransformer.transform(IdentityTransformer.java:29)
at com.hp.hpl.jena.grddl.impl.GRDDL.initialParse(GRDDL.java:234)
at com.hp.hpl.jena.grddl.impl.GRDDL.go(GRDDL.java:199)
at com.hp.hpl.jena.grddl.GRDDLReader.read(GRDDLReader.java:47)
at sites.ParseSites.getModel(ParseSites.java:32)
at sites.ParseSites.queryWebSite(ParseSites.java:39)
at sites.ParseSites.main(ParseSites.java:20)
com.hp.hpl.jena.shared.JenaException: rethrew: net.sf.saxon.trans.DynamicError:
java.io.IOException: Server returned HTTP response code: 503 for URL:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
at com.hp.hpl.jena.grddl.impl.GRDDL.initialParse(GRDDL.java:254)
at com.hp.hpl.jena.grddl.impl.GRDDL.go(GRDDL.java:199)
at com.hp.hpl.jena.grddl.GRDDLReader.read(GRDDLReader.java:47)
at sites.ParseSites.getModel(ParseSites.java:32)
at sites.ParseSites.queryWebSite(ParseSites.java:39)
at sites.ParseSites.main(ParseSites.java:20)
Caused by: net.sf.saxon.trans.DynamicError: java.io.IOException: Server returned
HTTP response code: 503 for URL:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:313)
at net.sf.saxon.event.Sender.send(Sender.java:142)
at net.sf.saxon.IdentityTransformer.transform(IdentityTransformer.java:29)
at com.hp.hpl.jena.grddl.impl.GRDDL.initialParse(GRDDL.java:234)
... 5 more
Caused by: java.io.IOException: Server returned HTTP response code: 503 for URL:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.jav\
a:1313)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:300)
... 8 more