Locating a schema
Unlike DTDs, RELAX NG does not specify a way to locate the
schema for a document. nXML mode's way is to use a list of schema
locating files. A schema locating file is an XML document specifying
rules for locating a schema. It must be valid with respect to the
schema locate.rnc. Each file specifies a list of rules. The rules from each file are appended in order. To locate a schema
each rule is applied in turn until a rule matches. The matching rule
is then used to determine the schema.
The variable rng-schema-locating-files specifies
the list of schema locating files that nXML mode should use. It is not
an error if some of the files do not exist. If a file-name is
relative, it will be resolved relative to the document for which a
schema is being located.
You can, of course, use nXML mode itself to edit schema locating
files.
You can use the command C-c C-s to manually select
the schema for the document in current buffer. Emacs will read the
file-name of the schema from the minibuffer. After reading the
file-name, Emacs will ask whether you wish to add a rule to a schema
locating file that persistently associates the document with the
selected schema.
C-c C-t is similar to C-c C-s. However,
instead of specifying the schema's file-name, you instead specify a
type identifier for the document. The type identifier will be used to
select the schema using the schema locating files. The available type
identifiers are determined by schema locating files. As with C-c
C-s, Emacs will ask whether you wish to add a rule to a schema
locating file that persistently associates the document with the
specified type identifier.
Schema locating file basics
The document element of a schema locating file must be
locatingRules and the namespace URI must be
http://thaiopensource.com/ns/locating-rules/1.0. The
children of the document element specify rules. The order of the
children is the same as the order of the rules. Here's a complete
example of a schema locating file:
<?xml version="1.0"?>
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
<namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
<documentElement localName="book" uri="docbook.rnc"/>
</locatingRules>
This says to use the schema xhtml.rnc for a document with
namespace http://www.w3.org/1999/xhtml, and to use the
schema docbook.rnc for a document whose local name is
book. If the document element had both a namespace URI
of http://www.w3.org/1999/xhtml and a local name of
book, then the schema xhtml.rnc would be
used.
As usual with XML-related technologies, resources are identified
by URIs. The uri attribute identifies the schema by
specifying the URI. The URI may be relative. If so, it is resolved
relative to the URI of the schema locating file that contains
attribute. This means that if the value of uri attribute
does not contain a /, then it will refer to a filename in
the same directory as the schema locating file. The
xml:base attribute may be used to change the base URI
used for resolving relative URIs.
Schema locating files are designed to be useful for other
applications that need to locate a schema for a document. In fact,
there is nothing specific to locating schemas in the design of the
schema for schema locating files; it could equally well be used for
locating a stylesheet.
Using the document's URI to locate a schema
A uri rule locates a schema based on the URI
of the document. The resource attribute
locates a schema for a particular resource. For example,
<uri resource="spec.xml" uri="docbook.rnc"/>
specifies that that the schema for spec.xml is
docbook.rnc. The pathSuffix attribute
locates a schema based on the suffix of the URI. It considers only
the path component of the URI. In terms of files, it is equivalent to
matching on the file extension. For example,
<uri pathSuffix=".xsl" uri="xslt.rnc"/>
specifies that the schema for documents whose URI ends with
.xsl is xslt.rnc.
A transformURI rule locates a schema by
transforming the URI of the document. If there is a
pathSuffix attribute, then the path component of the
document's URI must end with the specified suffix. If there is a
pathAppend attribute, then the URI is transformed by
appending the specified string to the URI's path component. If there
is a replacePathSuffix attribute, then the URI is
transformed by replacing the suffix matched by the
pathSuffix attribute by the value of the
replacePathSuffix attribute. A transformURI
rule matches only if the transformed URI is a valid URI that
identifies an existing resource. For example,
<transformURI pathSuffix=".xml" replacePathSuffix=".rnc"/>
specifies that to locate a schema for a document
foo.xml, Emacs should test whether a file
foo.rnc exists in the same directory as
foo.xml, and, if so, should use it as the
schema.
Using the document element to locate a schema
A documentElement rule locates a schema based on
the local name and prefix of the document element. For example, a rule
<documentElement prefix="xsl" localName="stylesheet" uri="xslt.rnc"/>
specifies that when the name of the document element is
xsl:stylesheet, then xslt.rnc should be used
as the schema. Either the prefix or
localName attribute may be omitted to allow any prefix or
local name.
A namespace rule locates a schema based on the
namespace URI of the document element. For example, a rule
<namespace ns="http://www.w3.org/1999/XSL/Transform" uri="xslt.rnc"/>
specifies that when the namespace URI of the document is
http://www.w3.org/1999/XSL/Transform, then
xslt.rnc should be used as the schema.
Using the DOCTYPE declaration to locate a schema
A doctypePublicId rule locates a schema based on
the public identifier specified in the DOCTYPE
declaration. For example, a rule
<doctypePublicId publicId="-//W3C//DTD XHTML 1.0 Transitional//EN"
uri="xhtml1-transitional.rnc"/>
specifies that when the document has a DOCTYPE
declaration with a public identifier -//W3C//DTD XHTML 1.0
Transitional//EN, then xhtml1-transitional.rnc
should be used as the schema.
Specifying a default schema
A default rule specifies a default schema. This rule
always matches. For example,
<default uri="docbook.rnc"/>
says to use the schema docbook.rnc.
Type identifiers for documents
Type identifiers allow a level of indirection in locating the
schema for a document. Instead of associating the document directly
with a schema URI, the document is associated with a type identifier,
which is in turn associated with a schema URI. nXML mode does not
constrain the format of type identifiers. They can be simply strings
without any formal structure or they can be public identifiers or
URIs. Note that these type identifiers have nothing to do with the
DOCTYPE declaration. When comparing type identifiers, whitespace is
normalized in the same way as with the xsd:token
datatype. Using type identifiers makes it easy for users to select
from a set of known schemas using C-c C-t.
Each of the rules described in previous sections that uses a
uri attribute to specify a schema, can instead use a
typeId attribute to specify a type identifier. The type
identifier can be associated with a URI using a typeId
element. For example,
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
<namespace ns="http://www.w3.org/1999/xhtml" typeId="XHTML"/>
<typeId id="XHTML" typeId="XHTML Strict"/>
<typeId id="XHTML Strict" uri="xhtml-strict.rnc"/>
<typeId id="XHTML Transitional" uri="xhtml-transitional.rnc"/>
</locatingRules>
declares three type identifiers XHTML (representing the
default variant of XHTML to be used), XHTML Strict and
XHTML Transitional. Such a schema locating file would
use xhtml-strict.rnc for a document whose namespace is
http://www.w3.org/1999/xhtml. But it is considerably
more flexible than a schema locating file that simply specified
<namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml-strict.rnc"/>
A user can easily use C-c C-t to select between XHTML
Strict and XHTML Transitional. Also, a user can easily add a catalog
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
<typeId id="XHTML" typeId="XHTML Transitional"/>
</locatingRules>
that makes the default variant of XHTML be XHTML Transitional.
A typeIdProcessingInstruction rule allows a
document to specify its own typeId with a processing instruction. The
target attribute specifies the processing instruction
target that should be recognized as specifying a typeId in its
value. For example, with an additional rule
<typeIdProcessingInstruction target="my-doctype"/>
a document that started
<?my-doctype XHTML Transitional?>
<html xmlns="http://www.w3.org/1999/xhtml">
would be validated against xhtml-transitional.rnc.
A typeIdBase rule makes it possible to avoid having
to add an explicit rule for every typeId. For example, a rule
<typeIdBase append=".rnc"/>
occuring in a schema locating file
/home/jjc/schema/schemas.xml would make Emacs try to use
file /home/jjc/schema/DocBook.rnc for a type identifier
of DocBook; it would test whether that file existed, and
if it did, it would use it. In terms of URIs, Emacs appends the value
of the append attribute to the typeId; it then %-escapes
all URI-significant characters; this is then treated as a relative URI
and resolved relative to the base URI applicable to the
typeIdBased element. The typeId will be mapped to this
URI, provided that the URI identifies an existing resource.
Using multiple schema locating files
The include element includes rules from another
schema locating file. The behavior is exactly as if the rules from
that file were included in place of the include element. Relative URIs are resolved into absolute URIs before the inclusion is
performed. For example,
<include rules="../rules.xml"/>
includes the rules from rules.xml.
The process of locating a schema takes as input a list of schema
locating files. The rules in all these files and in the files they
include are resolved into a single list of rules, which are applied
strictly in order. Sometimes this order is not what is needed. For example, suppose you have two schema locating files, a private
file
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
<namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
</locatingRules>
followed by a public file
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
<transformURI pathSuffix=".xml" replacePathSuffix=".rnc"/>
<namespace ns="http://www.w3.org/1999/XSL/Transform" typeId="XSLT"/>
</locatingRules>
The effect of these two files is that the XHTML namespace
rule takes precedence over the transformURI rule, which
is almost certainly not what is needed. This can be solved by adding
an applyFollowingRules to the private file.
<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
<applyFollowingRules ruleType="transformURI"/>
<namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
</locatingRules>