Search the web
Sign In
New User? Sign Up
jena-dev · Jena Developers
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Embedding of well-formed XML literals in SPARQL results WITHOUT tag   Message List  
Reply | Forward Message #25395 of 42071 |
Re: [jena-dev] Embedding of well-formed XML literals in SPARQL results WITHOUT tag escaping



Stu de Tejas wrote:
> Howdy folks,

Stu,

Thanks for the comment on XML literals in the SPARQL results format. How
about sending a comment to the DAWG comments list :
public-rdf-dawg-comments@...?

Discussion inline ...

>
> I have a short java patch to offer those who face a problem similar
> to something we encountered recently. We are using ARQ to perform
> SPARQL queries on a model that contains literals holding blocks of
> well-formed XML content. (I wish we didn't HAVE to do that, but
> that's another story).

Can you post an example? What do you do about namespaces (and language tags)
between the results format and the embedded XML? The results use a

<sparql xmlns="http://www.w3.org/2005/sparql-results#">

> As recently discussed here on the list,
> Jena RDF literals may be marked as well-formed XML using the datatype
> rdf:XMLLiteral (which is also reflected in the "wellFormedXML()"
> property of the java Literal object). This works fine; the
> hangup comes in when we do a SPARQL query that produces results of
> this type (in the SPARQL "XML-results" format -
> http://www.w3.org/TR/rdf-sparql-XMLres/ ) ,
> and then pass those results to an XSL stylesheet.
>
> The problem is that (using ARQ 1.4) the literal XML blocks are
> tag-escaped when they written into the SPARQL results XML
> (i.e. "<tag>" becomes "&lt;tag&gt;"). The negative impact of
> this choice is that a downstream parser which is processing these
> results (e.g. an XSL processor) will treat the included block as
> text, not as XML nodes which are available for XPath selection and
> so on. In some cases (e.g. in a web-based editor, perhaps) this
> behavior may be what you want, but in our case it is not. Of course
> it is possible to workaround the problem by forcing a parse of
> the XML literal before passing it to the stylesheet, but the
> fact remains that we would really much rather have the
> the XML in our literals be "at the same level of escaping" as
> the XML describing the rest of the SPARQL result set. So, I wrote
> a very short patch to
>
> com.hp.hpl.jena.query.resultset.XMLOutputResultSet
>
> to change the behavior. I was pleased with how easy the change was to
> make, and that's why I decided to post it here. (It would make
> sense to me if the jena ResultSetFormatter allowed a flag called
> "escapeWellFormedLiterals" or somesuch to be be passed in.
> But I didn't implement all that, I just changed the default behavior
> for our ARQ installation.) To install the change, you need
> to either recompile ARQ, or put your patched version of
> XMLOutputResultSet.class ahead of ARQ.jar on the classpath.
>
> The actual edit required is this: Replace the single line at
> 187 saying "out.print(xml_escape..." with these contents:
>
> // BEGIN patch
> // BEFORE: ARQ 1-4 version had this single line
> // out.print(xml_escape(literal.getLexicalForm())) ;
> // AFTER: We check whether the contents are legit XML, and
> // avoid escaping if they are.
>
> String literalLexicalForm = literal.getLexicalForm();
> boolean wellFormed = literal.isWellFormedXML();
> String literalOutput = (wellFormed) ? literalLexicalForm
> : xml_escape(literalLexicalForm);
> out.print(literalOutput);
> // END patch

Firstly, note ARQ can do this very easily only because it does not use an XML
writer based on DOM or SAX or some such. As the SPARQL results format is so
simple, I just wrote the raw XML out (same for JSON) which guarantees streaming.

Reading results is based on StAX (or SAX). The ARQ result set reader will not
work on this result set - it takes the text from the <binding> as the literal
lexical form.

> Do others agree that "unescaped well-formed XML literals" should be a
> legitimate output mode for ARQ?

The effect you want is like rdf:parseType="Literal" in RDF/XML. This is very
complicated for the reader in the general case of XML namespaces and language
tags.

> Two very squishy datapoints:
>
> 1) My perusal of http://www.w3.org/TR/rdf-sparql-XMLres/
> leaves me with the impression that the spec is open on this point.

It says:

"""
RDF Typed Literal S with datatype URI D
<binding><literal datatype="D">S</literal></binding>
"""
S is the lexical form of the literal and in the XML output it must be the
lexcial form, not some XML that will turn into the lexical form.

With a plain string, any < needs to be turned into an entity to hide it from
the XML parser.

Hence the escaping to put < (the > is not necessary but I prefer to) as
uninterpreted characters that, after entity replacement, put the characters of
the lexical form into the literal on reading.

The XML schema for the results format:
<xs:element name="literal">
<xs:complexType mixed="true">
<xs:attribute name="datatype" type="res:URI-reference"/>
<xs:attribute ref="xml:lang"/>
</xs:complexType>
</xs:element>

it's a complexType to allow the attribute. There is no sub <xs:element> and
XML schema are closed.

The RelaxNG is:

literal = element res:literal {
datatypeAttr?, xmlLang?,
text
}

It has a text body. (The RelaxNG was used to create the XML schema :-)


A design goal for the format was to support XML schema-driven processing. One
of the criticisms of RDF/XML is the lack of XML Schema for it. Arbitrary XML
is part of that.

My reading is that it is not open (but I know the design criteria as well so
it might bias my reading :-). It could be clearer in the TR - so if you could
send a comment to public-rdf-dawg-comments@... that would be great.

ARQ will follow whatever is decided by DAWG.

> 2) I believe that in Sesame currently you CAN
> switch back and forth between these behaviors (But...uh...now I
> can't find the discussion page that made me think this).

The only control I could find is SPARQLResultsWriter.setPrettyPrint. If you
do find that discussion, could you forward it?

>
> Stu Baurmann

Andy



Sun Sep 24, 2006 10:18 am

andyseaborne
Offline Offline
Send Email Send Email

Forward
Message #25395 of 42071 |
Expand Messages Author Sort by Date

Howdy folks, I have a short java patch to offer those who face a problem similar to something we encountered recently. We are using ARQ to perform SPARQL...
Stu de Tejas
texas_stu
Online Now Send Email
Sep 24, 2006
2:14 am

... Stu, Thanks for the comment on XML literals in the SPARQL results format. How about sending a comment to the DAWG comments list : ...
Seaborne, Andy
andyseaborne
Offline Send Email
Sep 24, 2006
10:18 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help