Search the web
Sign In
New User? Sign Up
openreader-format · OpenReader Publication Working Group
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
"Book Indexing" markup in XML documents ?   Message List  
Reply | Forward Message #357 of 375 |
Everyone,

As part of the effort to develop a set of special OpenReader namespace
elements which may be applied to any OR supported content document
vocabulary, I'm interested in established 'standards', or general
principles, for markup to add indexing information *directly* to
content.

This is NOT the same as building a "back-of-book" index which links to
points within documents, but rather the placing of the book indexing
information (the "metadata") right within the content, allowing
OpenReader user agents to extract the information and machine-build
linkable indexes, as well as do other stuff with the information.

For example, here's a paragraph we might find in a content document:

<p>My favorite breed of dog is the Australian Terrier.</p>

We might add indexing information as follows (note, this is NOT a
specific proposal, but simply something I pulled out of the air for
illustrating what I'm talking about):

<p>My favorite breed of dog is the
<or:indexitem xml:id="term259"
term="dog breeds -- Australian Terrier"/>
Australian Terrier<or:endindexitem idref="term259"/>.</p>

The element <or:indexitem/> assigns the index item term (metadata,
here "dog breeds -- Australian Terrier"), and the optional
<or:endindexitem/> simply closes the <or:indexitem/> to define the
range that the index term applies to.

[The main reason I use two empty tags to define the range the index
term applies to, rather than a single non-empty tag, is to allow the
range to cross XML hierarchies. If <or:endindexitem/> is missing for a
given <or:indexitem/>, then the index metadata applies at the point of
occurrence with unspecified range.]

One complicating factor is representing index term hierarchy -- we may
need to apply terms in some hierarchical fashion, e.g.:

<or:indexitem xml:id="term259"
term1="dog breeds"
term2="Australian Terrier"/>

(We could even have term3, term4, term5, etc.)

Any thoughts, suggestions, recommendations, criticisms? Anyone here
know an expert I could consult with? It would not surprise me if
someone or some group has already designed something to do this, but I
haven't found that something yet.

Thanks.

Jon Noring





Thu Oct 5, 2006 1:28 am

jon_noring
Offline Offline
Send Email Send Email

Forward
Message #357 of 375 |
Expand Messages Author Sort by Date

Everyone, As part of the effort to develop a set of special OpenReader namespace elements which may be applied to any OR supported content document vocabulary,...
Jon Noring
jon_noring
Offline Send Email
Oct 5, 2006
1:29 am

Hi Jon, ... This looks rather like DocBook indexing markup. Cheers, Michael -- Print XML with Prince! http://www.princexml.com...
Michael Day
mikeday@...
Send Email
Oct 5, 2006
1:58 am

... Hmmm, maybe, I'll have to redig into DocBook. The section I looked at with regards to DocBook indexing did not appear to embed indexing information within...
Jon Noring
jon_noring
Offline Send Email
Oct 5, 2006
2:58 am

... You place <indexterm> elements through the document, then at the end place an empty <index/> element which is where the actual index will be generated. ...
Michael Day
mikeday@...
Send Email
Oct 5, 2006
4:54 am

... Cool! That clarifies it. It also shows I was close, although the indexing terms in DocBook are part of content, while in my example, they were attribute...
Jon Noring
jon_noring
Offline Send Email
Oct 5, 2006
5:25 am

Some limitations of the brief sample Jon provided and IIRC of DocBook (but lord knows, I am no DocBook expert): * Only provides for 1 index. * Definitionally...
Syd Bauman
syd_bauman
Offline Send Email
Oct 5, 2006
7:13 pm

... Thanks! From my initial study, it appears TEI is a little more powerful at representing embedded indexing information than is DocBook. In my chats with a...
Jon Noring
jon_noring
Offline Send Email
Oct 5, 2006
9:25 pm

Please also include out-of-band indexterm entries. I've had a lot of utility from DocBook indexterm's zone attribute: ...
Peter Ring
peter17ring
Offline Send Email
Oct 6, 2006
8:32 am

... Again, interesting info. The feedback I've been getting, from here and from Jon Jermey and David Ream (who is one of the top experts in the area of XML and...
Jon Noring
jon_noring
Offline Send Email
Oct 6, 2006
10:13 pm

Indexing is a hugely important type of metadata, in my opinion. I suppose that the context in which we are carrying out this discussion is mostly in regard...
rickbarry@...
rickbarry1
Offline Send Email
Oct 8, 2006
5:09 am

... Rick's message reply is an outstanding summary of several issues related to indexing. What I found of most interest is looking at author-supplied indexing,...
Jon Noring
jon_noring
Offline Send Email
Oct 16, 2006
5:36 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help