Elliotte Rusty Harold wrote:
> The XML grammar splits processing instructions into a target and
> data.
hi,
i think there is no definition of PI data in XML 1.0 grammar
but just PITarget and PI data was introduced in XML infoset as content:
processing instruction is defined in XML 1.0 Section 2.6 Processing Instructions
http://www.w3.org/TR/REC-xml#sec-pi
[16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
[17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
and Processing Instruction Information Item content is defined in
http://www.w3.org/TR/xml-infoset/#infoitem.pi
(...) [content] A string representing the content of the processing instruction,
excluding the target and any white space immediately following it.
If there is no such content, the value of this property will be an empty
string.(...)
> For example, consider this processing instruction:
>
> <?xml-stylesheet href='test.css' type='text/css'?>
>
> The target is xml-stylesheet. The data is "href='test.css'
> type='text/css'". The target is separated from the data by white
> space. There is no requirement that the data have any particular
> syntax other than not containing the string "?>".
>
> Currently, XMLPULL munges the target and the data together. Both are
> returned by getText().
>
> I'm thinking it would be more convenient to have getName() return the
> target and getText() return the data. Most uses of processing
> instructions I'm familiar with do require this distinction to be
> made. We might as well make it by default.
and XmlPull returns in getText() exactly this "PITarget (S (Char* - (Char* '?>'
Char*)))?"
so it is very easy to break processing instruction into target and data and to
make
it even easier i have added two functions getPITarget() and getPIData()
to utility class XmlPullWrapper.java that is available from:
http://www.extreme.indiana.edu/~aslom/xmlpull/org/xmlpull/v1/util/
and they do this:
/**
* Return PITarget from Processing Instruction (PI) as defined in
* XML 1.0 Section 2.6 Processing Instructions
* <code>[16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))?
'?>'</code>
*/
public String getPITarget()
{
int eventType;
try {
eventType = pp.getEventType();
} catch(XmlPullParserException ex) {
// should never happen ...
throw new IllegalStateException(
"could not determine parser state: "+ex);
}
if( eventType != pp.PROCESSING_INSTRUCTION ) {
throw new IllegalStateException(
"parser must be on processing instruction and not "
+pp.TYPES[ eventType ]);
}
String PI = pp.getText();
for (int i = 0; i < PI.length(); i++)
{
if( isS(PI.charAt(i)) ) {
// assert i > 0
return PI.substring(0,i);
}
}
return PI;
}
/**
* Return everything past PITarget and S from Processing Instruction (PI) as
defined in
* XML 1.0 Section 2.6 Processing Instructions
* <code>[16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))?
'?>'</code>
*
* <p><b>NOTE:</b> if there is no PI data it returns empty string.
*/
public String getPIData()
{
int eventType;
try {
eventType = pp.getEventType();
} catch(XmlPullParserException ex) {
// should never happen ...
throw new IllegalStateException(
"could not determine parser state: "+ex);
}
if( eventType != pp.PROCESSING_INSTRUCTION ) {
throw new IllegalStateException(
"parser must be on processing instruction and not "
+pp.TYPES[ eventType ]);
}
String PI = pp.getText();
int pos = -1;
for (int i = 0; i < PI.length(); i++)
{
if( isS(PI.charAt(i)) ) {
pos = i;
} else if(pos > 0) {
return PI.substring(i);
}
}
return "";
}
/**
* Return true if chacters is S as defined in XML 1.0
* <code>S ::= (#x20 | #x9 | #xD | #xA)+</code>
*/
private static boolean isS(char ch) {
return (ch == ' ' || ch == '\n' || ch == '\r' || ch == '\t');
}
there are also some unit tests for it in junit subpackage.
hope it makes it convenient to work with PIs.
thanks,
alek
--
The ancestor of every action is a thought. - Ralph Waldo Emerson