Everyone,
As part of the effort to develop a set of special OpenReader namespace
elements which may be applied to any OR supported content document
vocabulary, I'm interested in established 'standards', or general
principles, for markup to add indexing information *directly* to
content.
This is NOT the same as building a "back-of-book" index which links to
points within documents, but rather the placing of the book indexing
information (the "metadata") right within the content, allowing
OpenReader user agents to extract the information and machine-build
linkable indexes, as well as do other stuff with the information.
For example, here's a paragraph we might find in a content document:
<p>My favorite breed of dog is the Australian Terrier.</p>
We might add indexing information as follows (note, this is NOT a
specific proposal, but simply something I pulled out of the air for
illustrating what I'm talking about):
<p>My favorite breed of dog is the
<or:indexitem xml:id="term259"
term="dog breeds -- Australian Terrier"/>
Australian Terrier<or:endindexitem idref="term259"/>.</p>
The element <or:indexitem/> assigns the index item term (metadata,
here "dog breeds -- Australian Terrier"), and the optional
<or:endindexitem/> simply closes the <or:indexitem/> to define the
range that the index term applies to.
[The main reason I use two empty tags to define the range the index
term applies to, rather than a single non-empty tag, is to allow the
range to cross XML hierarchies. If <or:endindexitem/> is missing for a
given <or:indexitem/>, then the index metadata applies at the point of
occurrence with unspecified range.]
One complicating factor is representing index term hierarchy -- we may
need to apply terms in some hierarchical fashion, e.g.:
<or:indexitem xml:id="term259"
term1="dog breeds"
term2="Australian Terrier"/>
(We could even have term3, term4, term5, etc.)
Any thoughts, suggestions, recommendations, criticisms? Anyone here
know an expert I could consult with? It would not surprise me if
someone or some group has already designed something to do this, but I
haven't found that something yet.
Thanks.
Jon Noring