Indexing is a hugely important type of metadata, in my opinion. I suppose
that the context in which we are carrying out this discussion is mostly in
regard to indexes appearing at the back of books in the form of subject, author
and/or maybe URL indexes. It seems to me that indexing sets can be thought of
at a number of levels in this context. I think it would help to make some
distinctions among some of these levels. Also between embedding in a book text
vs embedding with the text but in a separate attribute file, and vs in non-embe
dded attribute databases.
The first level that comes to mind is the original author-supplied index
that is often, but not always, published as part of a book at its end. I've not
personally seen a novel with such an index.
Second, which I find is more often the case, is original publisher-supplied
indexing. Since most common authoring tools, e.g. MSWord, do not have serious
indexing tools, authors often depend on the publisher to provide indexing at
the end of the book.
I sometimes do professional book reviews and usually comment on not only the
content, but the structure (TOC, chapter organization) and context (index,
chapter titles). I have found that in most cases, I give the books very poor
grades on the latter two scores, especially indexes. They are typically too
short, poorly done, fail to include some of the pages for listed index terms,
fail to include at all some important index terms even that appear frequently
in the text, etc. They also lack thesaurus treatment to lead people to
information with conceptually related terms that don't necessarily appear in
the
book but where the concepts do and that are well used in the field. Thesaurus
treatment can provide capabilities above the typical full-text searching.
There are of course notable exceptions to the above shortcomings, such as
Tom Erl's books on Service-Oriented Architecture, which have both elaborate
TOCs and extensive indexes.
These considerations are not trivial of course, particularly with
professional books. Unlike novels that you pick up and read in a linear
fashion, most
people buy professional books to read (at least the portions that are of
special interest to them, assuming the TOC leads them to such places) but also
and sometimes more importantly to use as future references to go to when they
need to revisit a particular issue or subject area. That requires a very well
done index where topics are many and complicated, or an elaborate and well
organized TOC where topics are few and reasonably self contained, and ideally
both.
I was recently contacted by a frequently published author who is well
established in his field about a book he was writing. I asked him how he was
approaching the index. He quickly replied, to my surprise: Oh, I leave that to
the
publisher. But clearly, the author's and publisher's interests and incentives
are different. In another recent personal experience with a book in which I
had a chapter being published, when after nearly 2 years had passed since I
wrote the chapter, I offered the publisher an updated version of the material
that I had already written. The publisher said: No. Since we would have to do
the index all over again, we would prefer to use the older material. Rather,
they offered to place a note in my chapter indicating that an updated
version of the chapter was available on my website! We agreed on that and I had
the
updated version on my Website with the publisher's agreement before the book
hit the streets.
The first two types of indexes noted above are related to hard-copy books
and electronic-only books. Which brings us to electronic publishing. Once we
consider publishing digitized versions of books, including any published index,
it includes but opens post-publication possibilities beyond the first two
types noted above.
A third (post-publication) category is one provided by the author who
discovers after the fact that a lousy job was done on his/her book, OR by an
electronic publisher who adds value to the book by providing a much enhanced,
augmented index beyond that provided in the original hard copy version. This
could
be embedded with the electronic book as an attached metafile or supplied by
the electronic publisher in its controlled separate metadata database along
with its other publications.
Fourthly, consumer-readers may want to add their own index terms that are of
particular interest to their interests/needs/clients as a special kind of
personal annotation or digital marginalia.
The latter two categories may also provide interesting possibilities for
indexing fiction.
With these distinctions in mind, it seems to me that, as logical and
possibly IPR considerations, only original author/publisher-supplied indexing
should
be embedded as part of the text. Additional indexing, e.g., reader-supplied,
or as might be done by an electronic publisher (including using an
OpenReader compliant system, though I understand that is not currently a part
of the
OR design specs) should be provided only in an attribute file not embedded
with the text but which may or may not be embedded with the text file. It may
also make sense to include original author/publisher-provided indexing along
with such attributes.
Regards,
Rick Barry
(http://www.openreader.org/)
[Non-text portions of this message have been removed]