Update posted to the TeleRead blog:
http://www.teleread.org/blog/?p=5866
Includes links to the latest DTD and online demonstration document in
the "SimpleBook" XML markup vocabulary.
A major design goal is to be able to autoconvert the "SimpleBook"
document to a conforming OpenReader Publication.
Jon Noring
At 8:48 PM -0700 11/27/06, Jon Noring wrote:
> > FYI, the below is a current thread on the Electronic Records (ERECS-L) list
>> re font requirements for conversion in the PDF/A standard.
>
>Thanks! Very informative.
>
>I admire the PDF/A standard for its no-compromising position on "doing
>it right".
>
We tried!
In addition, many of the requirements from PDF/A have been
incorporated into PDF/X and PDF/E as well - and are under
consideration for PDF/UA.
As the most strict of the ISO PDF specifications, it is
serving quite nicely as the "minimal requirements" for any other
specification. And its VERY rapid adoption by 3rd parties is
helping.
Leonard
--
---------------------------------------------------------------------------
Leonard Rosenthol <mailto:leonardr@...>
Technical Standards Evangelist 215-938-7080 (voice)
Adobe Systems
Rick Barry wrote:
> FYI, the below is a current thread on the Electronic Records (ERECS-L) list
> re font requirements for conversion in the PDF/A standard.
Thanks! Very informative.
I admire the PDF/A standard for its no-compromising position on "doing
it right".
Jon Noring
FYI, the below is a current thread on the Electronic Records (ERECS-L) list
re font requirements for conversion in the PDF/A standard.
Regards,
Rick
In a message dated 11/27/2006 4:20:41 P.M. Eastern Standard Time,
vjones@... writes:
Tom:
I forwarded your questions to the ISO Standards WG working on the standard.
See their responses below.
Ginny Jones
(Virginia A. Jones, CRM, FAI)
Records Manager
Information Technology Division
Newport News Dept. of Public Utilities
Newport News, VA
vjones@...
____________________________________
From: PDF-Working Group [mailto:PDF-WG@...] On Behalf Of
Macduff Hughes
Sent: Wednesday, November 15, 2006 6:07 PM
To: PDF-WG@...
Subject: Re: Questions about Old Standard PostScript printer fonts and
PDF/A-1a
On question 1:
Yes, to correctly convert PostScript files that reference those fonts to
PDF/A, you must have those fonts on the machine that is doing the PostScript to
PDF conversion. The material in the PDF Reference stating that certain fonts
do not need to be embedded is overriden by the PDF/A specification, which
requires that all fonts be embedded. Note 2 to 6.3.4 states: "There is no
exemption from the requirements of 6.3.4 for the 14 standard Type 1 fonts."
On question 2:
"Preview and Print Embedding" is sufficient; Editable Embedding is not
required.
On question 3:
The sort of thing described there can be done with a custom PostScript
prologue that remaps font names, providing the substitute font has identical
metrics to the original. I don't know offhand of any easily available tools to
do
this.
Macduff Hughes
Adobe Systems, Inc.
____________________________________
From: PDF-Working Group [mailto:PDF-WG@...] On Behalf Of
Dwight Kelly
Sent: Wednesday, November 15, 2006 6:56 PM
To: PDF-WG@...
Subject: Re: FW: Questions about Old Standard PostScript printer fonts and
PDF/A-1a
1. Do I need to purchase and install all the old standard PostScript printer
fonts (Times, Helvetica, Helvetica Narrow, New Century Schoolbook, Palatino,
Bookman, Avant Garde, Zapf Chancery Medium Italic, and Zapf Dingbats) on the
machine used to convert any PostScript files that include reference to those
fonts so they get embedded and result in readable PDFs that print and
display properly? I understand that Courier, Helvetica, Times, Symbol, and
Dingbats
never need to be embedded since they are supposed to be part of the reader
(p. 16, PDF Reference,
_http://partners.adobe.com/public/developer/en/pdf/PDFReference16.pdf_
(http://partners.adobe.com/public/developer/en/pdf/PDFReference16.pdf) ) but
programs like Distiller still apparently need Symbol to create
the PDF (or it will substitute Courier).
Yes, you will need to purchase and install copies of the "Base 14" fonts.
PDF/A requires all fonts be embedded.
2. Does the embedding requirement for PDF/A-1a work with the Adobe fonts
that are "Licensed for Preview and Print Embedding" (which looks like most of
the old fonts above -
_http://www.adobe.com/type/browser/legal/embeddingeula.html_
(http://www.adobe.com/type/browser/legal/embeddingeula.html) ), or does
the PDF/A-1a standard require use of the less-restrictive licensing for the
fonts listed as "Licensed for Editable Embedding" (the Adobe Originals fonts)?
"Licensed for Preview and Print Embedding" should be fine for PDF/A creation
and use. Subsetting fonts will reduce file size and also prevent extraction
of the embedded fonts. Can also help prevent text editing.
3. Are there any open source font configurations that allow converting PS or
MS Office documents which include proprietary fonts to similar-looking fonts
that can be embedded to make readable and printable PDF/A-1a documents? (so
that a fairly long list of "bad, proprietary" fonts could be intelligently
mapped to similar open source fonts and then embedded in the PDF) If so, how
could this mapping be done with programs like Adobe Distiller or LiveCycle PDF
Generator?
I don't know of a program that can substitute fonts after the PDF has been
created.
--
Dwight Kelly
____________________________________
From: PDF-Working Group [mailto:PDF-WG@...] On Behalf Of
Jones, Virginia
Sent: Wednesday, November 15, 2006 1:17 PM
To: PDF-WG@...
Subject: FW: Questions about Old Standard PostScript printer fonts and
PDF/A-1a
Can anyone help this person with their questions?
Ginny Jones
(Virginia A. Jones, CRM, FAI)
Records Manager
Information Technology Division
Newport News Dept. of Public Utilities
Newport News, VA
vjones@...
____________________________________
From: Management & Preservation of Electronic Records
[mailto:ERECS-L@...] On Behalf Of Mangano, Thomas J (GE,
Research)
Sent: Wednesday, November 15, 2006 1:30 PM
To: ERECS-L@...
Subject: Questions about Old Standard PostScript printer fonts and PDF/A-1a
All,
A colleague who is establishing our procedures for long-term perservation of
documents following the PDF/A-1a standard asked the following questions
about fonts that I am unable to answer. Can anyone provide any help with these
questions?
1. Do I need to purchase and install all the old standard PostScript printer
fonts (Times, Helvetica, Helvetica Narrow, New Century Schoolbook, Palatino,
Bookman, Avant Garde, Zapf Chancery Medium Italic, and Zapf Dingbats) on the
machine used to convert any PostScript files that include reference to those
fonts so they get embedded and result in readable PDFs that print and
display properly? I understand that Courier, Helvetica, Times, Symbol, and
Dingbats
never need to be embedded since they are supposed to be part of the reader
(p. 16, PDF Reference,
_http://partners.adobe.com/public/developer/en/pdf/PDFReference16.pdf_
(http://partners.adobe.com/public/developer/en/pdf/PDFReference16.pdf) ) but
programs like Distiller still apparently need Symbol to create
the PDF (or it will substitute Courier).
2. Does the embedding requirement for PDF/A-1a work with the Adobe fonts
that are "Licensed for Preview and Print Embedding" (which looks like most of
the old fonts above -
_http://www.adobe.com/type/browser/legal/embeddingeula.html_
(http://www.adobe.com/type/browser/legal/embeddingeula.html) ), or does
the PDF/A-1a standard require use of the less-restrictive licensing for the
fonts listed as "Licensed for Editable Embedding" (the Adobe Originals fonts)?
3. Are there any open source font configurations that allow converting PS or
MS Office documents which include proprietary fonts to similar-looking fonts
that can be embedded to make readable and printable PDF/A-1a documents? (so
that a fairly long list of "bad, proprietary" fonts could be intelligently
mapped to similar open source fonts and then embedded in the PDF) If so, how
could this mapping be done with programs like Adobe Distiller or LiveCycle PDF
Generator?
Tom Mangano
GE Global Research
Niskayuna, NY USA
[Non-text portions of this message have been removed]
Everyone,
A few here may be interested in the TeleRead blog article I wrote
which provides a different perspective on the IDPF OCF 1.0 container
standard:
http://www.teleread.org/blog/?p=5725
Of course, OpenReader is briefly mentioned in the article.
Yes, my article is fairly critical, but not on the OCF standard itself
which, other than a couple nitpicky things that I don't describe in
the article, is a pretty good spec, and I am proud to have been a part
in its development.
Rather, I am critical on how the OCF spec is being overly grandstanded
and may lead some to believe it will solve the Tower of eBabel, or
provide the cure for cancer. It won't. The IDPF press releases gives
the appearance IDPF is doing something substantive about the Tower of
eBabel, but they really aren't. The OCF spec is important, but its
role to solve the format problem is limited since OCF is not really
an ebook format -- it is simply a wrapper for ebook formats, and it is
what is inside the wrapper which is critical.
Note: the OpenReader Container Format (ORCF, a preliminary draft is
mostly done which will be posted here) will be compatible with the OCF
spec in all the important ways, but cannot conform to it for one
obvious reason. We must also add a container identifier. Luckily, OCF
allows adding namespaced attributes to the container.xml file which
will be used to assign container ID (like ISBN.) For reasons of robust
interpublication linking (which resists breakage), the container ID
*must* be separate from the OpenReader Publication ID (they may be
assigned the same value, but don't have to be.)
Anyway, compatibility is good enough for us (we don't need to be 100%
conformant and we will never tout "conformity"), since the same tool
sets used to create and extract the files from an OCF container can
trivially be adapted to do likewise for ORCF files. That's good enough
for real world authoring and reading systems. After all, OCF is simply
a ZIP file.
Btw, it is the intent, as I understand it, for the IDPF OCF spec to
evolve into a general container format not tied to any specific
publication format such as OEBPS. There is talk that the ODF and IDFP
OCF folk will join forces to hammer out a next version container spec
usable in generic fashion. I hope this happens. The OCF spec should
have been generic from the start, since I see it used for all kinds of
containment of published media (texts, video, audio, etc.) There's
nothing better to get a spec really embraced than for it to meet the
needs of a wide range of groups, rather than a narrow focus as it is
now (e.g., the requirement an OCF must contain an OEBPS Publication.)
Jon Noring
A few may be interested in the TeleRead blog article I wrote
which "deconstructs" the "epub" format used in Adobe's Digital
Editions preview:
http://www.teleread.org/blog/?p=5702
Of course, OpenReader is mentioned in the article.
Jon Noring
Rick Barry wrote:
> Indexing is a hugely important type of metadata,...
Rick's message reply is an outstanding summary of several issues
related to indexing.
What I found of most interest is looking at author-supplied indexing,
publisher-compiled indexing, and user-generated indexing of digital
publications.
From the embedded indexing point of view, I lump the first two
together since the focus of standardizing embedded indexing is on the
standard side itself -- how to enable it. OpenReader need not focus on
who does the indexing. Ultimately, we simply want the spec not to get
in the way of how an actual index is embedded.
However, it is clear that indexing by end-users should not be embedded
within publications, but instead will be like annotations -- an index
item will point to within a publication. The reason is simple -- once
an author or publisher releases a publication, we have to assume that
the publication is "sealed" from any modification by those not
authorized to modify it.
This brings us into the other area of interest to me: standardization
of the annotation of OpenReader Publications and publications in other
similar XML-based frameworks (such as OEBPS). As I've written before,
annotation is a very broad term that includes what many think of
annotation (adding a note), bookmarking, highlighting, and yes, adding
an "index item". They are all similar -- maybe a term other than
"annotation" is needed to lump together all these different items.
Jon Noring
Indexing is a hugely important type of metadata, in my opinion. I suppose
that the context in which we are carrying out this discussion is mostly in
regard to indexes appearing at the back of books in the form of subject, author
and/or maybe URL indexes. It seems to me that indexing sets can be thought of
at a number of levels in this context. I think it would help to make some
distinctions among some of these levels. Also between embedding in a book text
vs embedding with the text but in a separate attribute file, and vs in non-embe
dded attribute databases.
The first level that comes to mind is the original author-supplied index
that is often, but not always, published as part of a book at its end. I've not
personally seen a novel with such an index.
Second, which I find is more often the case, is original publisher-supplied
indexing. Since most common authoring tools, e.g. MSWord, do not have serious
indexing tools, authors often depend on the publisher to provide indexing at
the end of the book.
I sometimes do professional book reviews and usually comment on not only the
content, but the structure (TOC, chapter organization) and context (index,
chapter titles). I have found that in most cases, I give the books very poor
grades on the latter two scores, especially indexes. They are typically too
short, poorly done, fail to include some of the pages for listed index terms,
fail to include at all some important index terms even that appear frequently
in the text, etc. They also lack thesaurus treatment to lead people to
information with conceptually related terms that don't necessarily appear in
the
book but where the concepts do and that are well used in the field. Thesaurus
treatment can provide capabilities above the typical full-text searching.
There are of course notable exceptions to the above shortcomings, such as
Tom Erl's books on Service-Oriented Architecture, which have both elaborate
TOCs and extensive indexes.
These considerations are not trivial of course, particularly with
professional books. Unlike novels that you pick up and read in a linear
fashion, most
people buy professional books to read (at least the portions that are of
special interest to them, assuming the TOC leads them to such places) but also
and sometimes more importantly to use as future references to go to when they
need to revisit a particular issue or subject area. That requires a very well
done index where topics are many and complicated, or an elaborate and well
organized TOC where topics are few and reasonably self contained, and ideally
both.
I was recently contacted by a frequently published author who is well
established in his field about a book he was writing. I asked him how he was
approaching the index. He quickly replied, to my surprise: Oh, I leave that to
the
publisher. But clearly, the author's and publisher's interests and incentives
are different. In another recent personal experience with a book in which I
had a chapter being published, when after nearly 2 years had passed since I
wrote the chapter, I offered the publisher an updated version of the material
that I had already written. The publisher said: No. Since we would have to do
the index all over again, we would prefer to use the older material. Rather,
they offered to place a note in my chapter indicating that an updated
version of the chapter was available on my website! We agreed on that and I had
the
updated version on my Website with the publisher's agreement before the book
hit the streets.
The first two types of indexes noted above are related to hard-copy books
and electronic-only books. Which brings us to electronic publishing. Once we
consider publishing digitized versions of books, including any published index,
it includes but opens post-publication possibilities beyond the first two
types noted above.
A third (post-publication) category is one provided by the author who
discovers after the fact that a lousy job was done on his/her book, OR by an
electronic publisher who adds value to the book by providing a much enhanced,
augmented index beyond that provided in the original hard copy version. This
could
be embedded with the electronic book as an attached metafile or supplied by
the electronic publisher in its controlled separate metadata database along
with its other publications.
Fourthly, consumer-readers may want to add their own index terms that are of
particular interest to their interests/needs/clients as a special kind of
personal annotation or digital marginalia.
The latter two categories may also provide interesting possibilities for
indexing fiction.
With these distinctions in mind, it seems to me that, as logical and
possibly IPR considerations, only original author/publisher-supplied indexing
should
be embedded as part of the text. Additional indexing, e.g., reader-supplied,
or as might be done by an electronic publisher (including using an
OpenReader compliant system, though I understand that is not currently a part
of the
OR design specs) should be provided only in an attribute file not embedded
with the text but which may or may not be embedded with the text file. It may
also make sense to include original author/publisher-provided indexing along
with such attributes.
Regards,
Rick Barry
(http://www.openreader.org/)
[Non-text portions of this message have been removed]
Peter Ring wrote:
> Please also include out-of-band indexterm entries. I've had a lot
> of utility from DocBook indexterm's zone attribute:
>
> http://www.docbook.org/tdg/en/html/indexterm.html#d0e96507
>
> The generel idea is that you should be able to annotate content
> with indexterm's without changing the content. Quite often, the
> index authoring is completely separate from the content authoring.
>
> Also be sure to include attributes or elements that allow hinting
> about the purpose of the indexterm, e.g. subject index vs. author index.
Again, interesting info.
The feedback I've been getting, from here and from Jon Jermey and
David Ream (who is one of the top experts in the area of XML and
indexing) is that the issue of embedded indexing is fairly complex as
one begins to peel away the layers. It does seem premature to
implement anything in OpenReader, at least in version 1.0, for
embedded indexing.
Interestingly, no one has apparently put together a comprehensive
generic XML vocabulary for embedded indexes (and could be included in
any XML document with proper namespacing). So this leaves open some
sort of standardization in the area. Maybe this could be done in OASIS
with sponsorship by ASI (asindexing.org)? (I'll be talking with David
Ream next week for his feedback on this proposal.)
Anyway, here's my preliminary list of candidate requirements the
embedded indexing vocabulary should meet:
1) handle multiple indexes in the same publication. (that is, each
embedded index term is to be applied to one or more indexes when
generated/compiled by the reading system.)
2) define the range or scope of the index item (and it may have to
cross the natural hierarchy of the XML document -- Lee? :^) )
3) handle hierarchical terms (many indexes have 2 and even more
levels)
4) support "see", "see also", etc. (cross-referencing)
5) Peter's suggestion of "out-of-band" indexing. (Seems to imply using
XPointer to define the target and the associated scope/range.)
6) support "sort as" information (to tell the reading system how to
order the terms when the index is generated/compiled -- necessary
since some terms may be in other character sets than the primary
character set.)
Anything else?
And anyone here interested in being involved with standards work
for embedded indexing should anything get off the ground? (If so,
contact me in private.)
Jon Noring
Please also include out-of-band indexterm entries. I've had a lot of utility
from DocBook indexterm's zone attribute:
http://www.docbook.org/tdg/en/html/indexterm.html#d0e96507
The generel idea is that you should be able to annotate content with indexterm's
without changing the content. Quite often, the index authoring is completely
separate from the content authoring.
Also be sure to include attributes or elements that allow hinting about the
purpose of the indexterm, e.g. subject index vs. author index.
kind regards
Peter Ring
> -----Original Message-----
> From: openreader-format@yahoogroups.com
> [mailto:openreader-format@yahoogroups.com]On Behalf Of Jon Noring
> Sent: 5. oktober 2006 07:25
> To: openreader-format@yahoogroups.com
> Subject: Re: [openreader-format] "Book Indexing" markup in
> XML documents
> ?
>
>
> Michael Day wrote:
> > Jon Noring wrote:
>
> >> Hmmm, maybe, I'll have to redig into DocBook. The section
> I looked at
> >> with regards to DocBook indexing did not appear to embed indexing
> >> information within the content documents, but simply defined how to
> >> build a "back of book index" with pointers to associated content
> >> (possibly id/idref pairs.) But then maybe I totally misinterpreted
> >> what I saw -- DocBook is one complicated markup vocabulary!
>
> > You place <indexterm> elements through the document, then
> at the end
> > place an empty <index/> element which is where the actual
> index will be
> > generated.
> >
> > Here is an article on mastering DocBook indexes [sic]:
> >
> > http://www.xml.com/pub/a/2004/07/14/dbndx.html
>
> Cool! That clarifies it. It also shows I was close, although the
> indexing terms in DocBook are part of content, while in my example,
> they were attribute values. The "sortas" attribute is also important,
> and suggests putting the index terms as content and not in attribute
> values.
>
> From other feedback, the indexing community calls what I'm looking for
> "embedded indexing", and next week I'll be consulting with one of the
> top experts who is very interested in embedded indexing. From what I
> gather, there's no 'standard" markup, although DocBook's approach can
> be considered a "standard".
>
> Jon Noring
>
>
>
> ----------------------------------------------------
> Post a message: openreader-format@yahoogroups.com
> Unsubscribe: openreader-format-unsubscribe@yahoogroups.com
> Switch to digest: openreader-format-digest@yahoogroups.com
> Switch to normal: openreader-format-normal@yahoogroups.com
> Put mail on hold: openreader-format-nomail@yahoogroups.com
> Administrator: openreader-format-owner@yahoogroups.com
> ----------------------------------------------------
> Yahoo! Groups Links
>
>
>
>
>
>
>
>
>
>
Syd Bauman wrote:
> Some limitations of the brief sample Jon provided and IIRC of DocBook
> (but lord knows, I am no DocBook expert):
>
> * Only provides for 1 index.
> * Definitionally limited number of subcatagories (this might be a
> good thing, though :-)
> * Cannot declare the language of the index term as being different
> from the language of the content.
>
> TEI indexing takes care of all three of these concerns. See
> http://www.tei-c.org/release/doc/tei-p5-doc/html/CO.html#CONOIX.
Thanks! From my initial study, it appears TEI is a little more powerful
at representing embedded indexing information than is DocBook.
In my chats with a couple indexers, the embedded index markup should
also handle cross-referencing, such as "see" and "see also" types of
information.
Jon Noring
Some limitations of the brief sample Jon provided and IIRC of DocBook
(but lord knows, I am no DocBook expert):
* Only provides for 1 index.
* Definitionally limited number of subcatagories (this might be a
good thing, though :-)
* Cannot declare the language of the index term as being different
from the language of the content.
TEI indexing takes care of all three of these concerns. See
http://www.tei-c.org/release/doc/tei-p5-doc/html/CO.html#CONOIX.
Michael Day wrote:
> Jon Noring wrote:
>> Hmmm, maybe, I'll have to redig into DocBook. The section I looked at
>> with regards to DocBook indexing did not appear to embed indexing
>> information within the content documents, but simply defined how to
>> build a "back of book index" with pointers to associated content
>> (possibly id/idref pairs.) But then maybe I totally misinterpreted
>> what I saw -- DocBook is one complicated markup vocabulary!
> You place <indexterm> elements through the document, then at the end
> place an empty <index/> element which is where the actual index will be
> generated.
>
> Here is an article on mastering DocBook indexes [sic]:
>
> http://www.xml.com/pub/a/2004/07/14/dbndx.html
Cool! That clarifies it. It also shows I was close, although the
indexing terms in DocBook are part of content, while in my example,
they were attribute values. The "sortas" attribute is also important,
and suggests putting the index terms as content and not in attribute
values.
From other feedback, the indexing community calls what I'm looking for
"embedded indexing", and next week I'll be consulting with one of the
top experts who is very interested in embedded indexing. From what I
gather, there's no 'standard" markup, although DocBook's approach can
be considered a "standard".
Jon Noring
> Hmmm, maybe, I'll have to redig into DocBook. The section I looked at
> with regards to DocBook indexing did not appear to embed indexing
> information within the content documents, but simply defined how to
> build a "back of book index" with pointers to associated content
> (possibly id/idref pairs.) But then maybe I totally misinterpreted
> what I saw -- DocBook is one complicated markup vocabulary!
You place <indexterm> elements through the document, then at the end
place an empty <index/> element which is where the actual index will be
generated.
Here is an article on mastering DocBook indexes [sic]:
http://www.xml.com/pub/a/2004/07/14/dbndx.html
Cheers,
Michael
--
Print XML with Prince!
http://www.princexml.com
Michael Day wrote:
> Jon Noring wrote:
>> Any thoughts, suggestions, recommendations, criticisms? Anyone here
>> know an expert I could consult with? It would not surprise me if
>> someone or some group has already designed something to do this, but I
>> haven't found that something yet.
> This looks rather like DocBook indexing markup.
Hmmm, maybe, I'll have to redig into DocBook. The section I looked at
with regards to DocBook indexing did not appear to embed indexing
information within the content documents, but simply defined how to
build a "back of book index" with pointers to associated content
(possibly id/idref pairs.) But then maybe I totally misinterpreted
what I saw -- DocBook is one complicated markup vocabulary!
Anyone?
Jon Noring
Hi Jon,
> Any thoughts, suggestions, recommendations, criticisms? Anyone here
> know an expert I could consult with? It would not surprise me if
> someone or some group has already designed something to do this, but I
> haven't found that something yet.
This looks rather like DocBook indexing markup.
Cheers,
Michael
--
Print XML with Prince!
http://www.princexml.com
Everyone,
As part of the effort to develop a set of special OpenReader namespace
elements which may be applied to any OR supported content document
vocabulary, I'm interested in established 'standards', or general
principles, for markup to add indexing information *directly* to
content.
This is NOT the same as building a "back-of-book" index which links to
points within documents, but rather the placing of the book indexing
information (the "metadata") right within the content, allowing
OpenReader user agents to extract the information and machine-build
linkable indexes, as well as do other stuff with the information.
For example, here's a paragraph we might find in a content document:
<p>My favorite breed of dog is the Australian Terrier.</p>
We might add indexing information as follows (note, this is NOT a
specific proposal, but simply something I pulled out of the air for
illustrating what I'm talking about):
<p>My favorite breed of dog is the
<or:indexitem xml:id="term259"
term="dog breeds -- Australian Terrier"/>
Australian Terrier<or:endindexitem idref="term259"/>.</p>
The element <or:indexitem/> assigns the index item term (metadata,
here "dog breeds -- Australian Terrier"), and the optional
<or:endindexitem/> simply closes the <or:indexitem/> to define the
range that the index term applies to.
[The main reason I use two empty tags to define the range the index
term applies to, rather than a single non-empty tag, is to allow the
range to cross XML hierarchies. If <or:endindexitem/> is missing for a
given <or:indexitem/>, then the index metadata applies at the point of
occurrence with unspecified range.]
One complicating factor is representing index term hierarchy -- we may
need to apply terms in some hierarchical fashion, e.g.:
<or:indexitem xml:id="term259"
term1="dog breeds"
term2="Australian Terrier"/>
(We could even have term3, term4, term5, etc.)
Any thoughts, suggestions, recommendations, criticisms? Anyone here
know an expert I could consult with? It would not surprise me if
someone or some group has already designed something to do this, but I
haven't found that something yet.
Thanks.
Jon Noring
Everyone,
The OpenReader Consortium has issued the first demo OpenReader 1.0
Publication. This follows the release of the OpenReader Binder and
Content Document specifications, see:
http://www.openreader.org/spec/
The source text for this demonstration publication comes from Willa
Cather's "My Antonia", which I transcribed from a First Edition print
copy last year, and with error corrections (both transcriptional and
those known to exist in the printed edition) by Jose Menendez of
iBiblio, who has his own high-quality online version of "My Antonia":
http://www.ibiblio.org/ebooks/Cather/index.html
The OpenReader 1.0 Publication of "My Antonia" is downloadable from:
http://www.openreader.org/spec/examples/My_Antonia_OpenReader_1.0_14-Aug-2006.zi\
p
Note that this Publication is contained within an ordinary zip file.
The XML file 'myantonia.orb' in the root folder is the Binder
document, the 'control center' of the whole Publication. It is similar
to, but much more advanced than, the OEBPS Package document.
We plan to soon recontain this Publication into the OpenReader
Container. The OR container will be compatible with the IDPF Container
and is essentially a zip file with a couple special files added.
Your feedback is requested and welcome.
Jon Noring
[A few here may be interested in the following I just posted to The
eBook Community. This thread has been kept separate from the TeBC
thread. Note, yesterday I published a TeleRead blog article covering
topics related to this thread. See: http://www.teleread.org/blog/?p=5067 ]
Ben Trafford wrote:
> Bill Janssen wrote:
>> I was just looking at the IDPF page called "Use Cases and Requirements
>> for Next Version of the Open eBook Publication Structure (OEBPS)" (at
>> http://www.idpf.org/doc_library/informationaldocs/oebps_requirements.htm).
>> I noticed that their schedule called for a meeting last week: "The
>> Working Group will meet at a face to face meeting on June 20th and
>> 21st in New York City to present and select technologies to satisfy
>> the following requirements."
>>
>> Does anyone know if anything came of this?
I was unable to attend the F2F (face-to-face) meeting, neither in
person nor by teleconference, but did submit some comments prior to
the F2F on a couple topics. (I'm an "invited expert" to the group.)
Garth Conboy, the chair of the OEBPS WG, just sent out the minutes
plus a first "working document" which resulted from the decisions made
at the F2F. However, that information has not yet been made public.
Overall, and *at first glance only*, I'm comfortable with most of the
decisions that were made at the F2F. I'm happy for the acknowledgement
of the importance, and preservation, of out-of-spine content. I'm also
happy that the DTBook document type has been accepted as an
alternative content document vocabulary, although there no doubt needs
to be some integration work (OpenReader plans to support the same
document type now that the fundamental architecture is in place to
readily allow multiple vocabularies.) The embracement of the NCX (the
DAISY navigation control file) is also good, although for the long-term
I think the OpenReader approach is technically superior and more
flexible and will work for the accessibility community (it, too, could
no doubt be improved.)
By and the large, OEBPS will be updated, but it does trouble me that
these changes are being done sort of like adding after-market parts to
a car. I really believe the time should have been spent to revitalize
the OEBPS 2.0 effort, to update the Roadmap, and then to work on the
next-generation first, which will erect a good base upon which to
build the spec for the next decade.
Or, if the only intent was to "modernize" the current OEBPS 1.2, the
changes should have been smaller, meaning the new spec could've been
released in as little as 2 months, which would have speeded things up
and got something the ebook industry could use. There's no compelling
need to have SVG and embedded fonts *today*.
> No idea, but I find their requirements document opaque and
> unreadable. Nothing is explained. Bullet points do not provide for an
> informative document.
I've thought, too, that the OEBPS WG charter is too vague.
It also does not properly acknowledge that the original PSWG Roadmap
(developed from 2000 to 2003) was to be the starting point for all new
work. In essence, the PSWG Roadmap has been quietly tossed aside
without much discussion, and nothing has been put in its place.
How can one make decisions for the short-term without a longer-term
vision and plan? You *can't*, yet that is what OEBPS WG is chartered
to do.
The decision of what items to add to OEBPS was done essentially
arbitrarily (although Garth might disagree) based on the request of,
I surmise, Adobe (they are definitely hot-to-trot to get both embedded
fonts and SVG in the spec, which is a good idea to do *at the right
time*, but adding those should have come *after* a comprehensive
evaluation and updating/amendment of the Roadmap. "A" must come
before "B".)
I know several of the PSWG old-timers were and are quite perturbed on
how the OEBPS WG charter was created and approved, as well as
wondering why the original PSWG was dechartered -- no reason was ever
publicly given. They should have simply revived PSWG. It would have
maintained better continuity with the past, even if perceptual.
It is indicative that very few of the quite active PSWG old-timers are
still around contributing to the new OEBPS spec: those still here
include Garth Conboy, Brady Duga (Garth and Brady come as a team
<smile/>), Steve Kotrch, George Kerscher, and yours truly. There were
quite a few other PSWGers who made substantive and even major
contributions to the original OEBPS who are no longer around -- a few
of them are thoroughly disillusioned, to put it mildly, by the lack
of support of standards work by OeBF (formerly IDPF) back in 2002-3.
They don't want to have anything to do with anything "ebook", not even
OpenReader.
That's a *huge* waste of great talent that was blown off by a lack
of vision and bad policies in 2002-3. By and large, the same
leadership that ran OeBF then is still leading IDPF today, so their
sincerity of catching of "standards religion" is certainly questionable.
>> By the way, what do people think of IDPF as a representative standards
>> body in this area? Just looking at their membership rolls
>> (http://www.idpf.org/membership/currentmembers.asp), it seems a pretty
>> reasonable mix of publishers and technologists, along with a few
>> outliers (like the Boy Scouts of America).
> I have the same issues with IDPF that I had with OEBF back
> when I was on that body's board of directors. It's not open enough.
> Why are we forced to sit in the dark, waiting for the wisdom from on
> high? Where is the public review of specifications? This ebook
> community here could operate much like the XML-DEV mailing list did
> for XML, back in the day. It could be a place to get ideas outside of
> the box and to really let people have a good stab at the problems
> with the standards they're proposing.
Well, there is supposedly a new commitment to openness, but this
requires not only lip service, but action as well. The action to open
up all deliberations is slow to be enabled. First, the current OEBPS
WG mailing list is still operated internally to IDPF (thus requiring
someone at IDPF to babysit it), and as such does not yet have a
searchable, public archive. The discussion group should be moved as
soon as possible to a publicly open YahooGroup or GoogleGroup, where
everything is archived and publicly searchable, and files can also be
shared.
Why IDPF does not take advantage of external resources to enable
openness, and instead does things all internally (which costs $$$ and
time, mostly Nick's time which is better spent doing something else),
is beyond me. Nick, are you reading this?
> As it is? The only reason I don't toss rocks at IDPF from
> afar is because of the respect I have for people like Garth Conboy. I
> really wish they'd open up more, and really work at a little more
> public outreach. Putting up press releases on their website isn't
> enough. There needs to be community involvement; ebooks are not so
> advanced that grassroots work is out of the question.
Yes, there are individuals still associated with IDPF who I highly
respect, and this includes, among a few others, Garth, Brady (the
dynamic duo), Nick Bogaty, and George Kerscher who still hangs
around IDPF (although he was defeated when he ran again for the
Board, which is a tragedy.) But there are institutional issues.
Fortunately, too, IDPF has relaxed their policy regarding invited
experts, but they are not making any effort to recruit outside
tech-types to contribute to the specs.
I really do think IDPF should spin off its spec effort to OASIS,
for several reasons I won't get into here (I make the same call in the
TeleRead blog article I posted yesterday -- see link below.)
Jon Noring
[Note, yesterday I published a TeleRead blog article covering topics
related to this thread. See: http://www.teleread.org/blog/?p=5067 ]
--- Roger Sperberg wrote:
>
> I don't believe that OpenReader as currently proposed is a
> legitimate reason to splinter the OEB format development. (I've
> expressed this sentiment to Jon Noring before and in blogs at
> Teleread.)
>
> I reiterate my position at Teleread in a post entitled "OpenReader
> manifesto."
>
...
>
> "OpenReader manifesto" http://www.teleread.org/blog/?p=5011
>
Please note that Jon Noring has written a response also published at
Teleread:
‘OpenReader manifesto’ ignores business realities of e-book industry
http://www.teleread.org/blog/?p=5012
Preferring the public fora to this group, I have written an essay
posted at Teleread called "THe case against HTML in OpenReader":
http://www.teleread.org/blog/?p=5025
Roger Sperberg
I haven't posted to this list because I think its validity is in question.
I don't believe that OpenReader as currently proposed is a legitimate
reason to splinter the OEB format development. (I've expressed this
sentiment to Jon Noring before and in blogs at Teleread.)
I reiterate my position at Teleread in a post entitled "OpenReader
manifesto." [1]
Although Jon Noring has said before that OpenReader could be developed
as a specification under the auspices of an OASIS Technical Committee,
I do not believe an OpenReader project charter would be accepted as
the plans currently exist. OASIS states explicitly that it will not
sponsor spec-creation for alternatives to existing standards.
As I understand him, Jon would delay most or all of the features I've
discussed until a future version of OpenReader; his opposition, I
believe him to be saying, is one of timing and possibly practicality
rather than of good or bad implementation.
At any rate, despite Jon's launching of the OpenReader initiative, I
believe it must bite off considerably more than he proposes. My
thoughts, as indicated, can be found at Teleread.
Roger Sperberg
[1] "OpenReader manifesto" http://www.teleread.org/blog/?p=5011
[Note to the OpenReader format group: I just posted the following to
the new IDPF OEBPS Working Group of which I'm an invited contributor.
I've been a contributor to all versions of OEBPS. Anyway, you are
welcome to provide feedback, as you see it, on the "out-of-spine"
issue, both supportive and critical of my position. It is relevant to
the OpenReader Publication Framework. Thanks! Jon]
Everyone,
Since the "out-of-spine" feature of the current OEBPS spec is
curiously (and troublingly so) on the slate for discussion at the
Face-To-Face, and because I am unfortunately unable to attend the F2F
in person, I want to start dialogue well in advance on this topic, and
give some of my thoughts on "out-of-spine."
I'll touch upon why it is critical to maintain and to even expand
this feature, rather than remove it, as I fear some might want. (If it
is to be removed, it needs to be replaced by another mechanism of
equal or greater power, which no one has so far proposed.)
I am tentatively planning to publish an essay on the TeleRead blog
(and elsewhere) concerning some of the topics I'll raise below.
What Is "Out-Of-Spine" Content?
-------------------------------
In all versions of OEBPS 1.x published to date, content documents
listed in the manifest may be referenced from the Spine, Tours, and
the Guide (and by implication other content documents), and these
content documents need not be in the Spine. Content documents not part
of the Spine are referred to by the PSWG "old-timers" (but not in the
spec) as "out-of-spine" content. For some background, read Section 2.4
of OEBPS 1.2:
http://www.openebook.org/oebps/oebps1.2/download/oeb12-xhtml.htm#sec2.4
Unfortunately, OEBPS is silent regarding how reading systems are to
handle out-of-spine content, but then it is also silent (as far as I
can determine) on whether content documents in the Spine are also to
be rendered when accessible by the user. It is simply assumed
throughout the spec that if the content documents are there, whether
they are in the Spine, or out-of-spine, and accessible to the user
either directly or indirectly via the Spine, Tours and/or Guide
facilities, the reading systems *will* render them on user demand.
For example, if I built an OEBPS reading system, and simply refused to
render all the content documents in the Spine starting with the letter
"a", then from what I gather, the spec allows me to do so. But is that
right? Of course not. There are moral expectations underlying the
entire OEBPS specification which are not explicitly stated since they
are, or should be, patently obvious.
Same for out-of-spine content. If the publisher places it there, and
references it from the Spine as allowed by the spec, then the
publisher has an expectation that all reading systems will render it.
And so long as "out-of-spine" is supported in OEBPS, any reading
system which ignores out-of-spine content, when present in the OEBPS
Publication input into the Reading System, DOES NOT conform to the
OEBPS specification.
Original Intent
---------------
I believe the majority of the original OEBPS authors intended
out-of-spine content to be rendered when referenced from the Spine,
Tours, or Guide -- and to be rendered in innovative ways. Why? The
reason is that, in reality, most publications are non-linear in
content organization, and that the digital publication reading systems
of the future must not be restricted to the physical limitations of
ink-on-paper, but rather should present non-linear content in ways
that best presents the content as it really is. It is important, then,
that OEBPS provides the facilities to represent the non-linearity of
publications, and allow reading systems to likewise innovate for
presenting non-linear publications.
For example, imagine if someone passed a rule saying that the
content of a web "site" must all be placed into a single home page?
Sounds silly, but that's exactly what OEBPS would be if we did not at
least have the out-of-spine feature. And note carefully that a "web
site" *is* a type of digital publication. Do we want the OEBPS of the
future to be throttled so it can't represent the structure of a
typical web site publication, like a snapshot of the Wikipedia? Or how
about hypertext literature?
We must be striving to better represent, digitally, the actual
organization of content, and to provide the means by which reading
systems may innovate to more accurately present such content, with
greater reader comprehension and end-user convenience. I consider the
out-of-spine feature to be the *start* of representing the true
nature of digital publications.
Since it will be brought up whether anyone has used the OEBPS
out-of-spine content, consider Microsoft. Their Reader beautifully
implements OEBPS out-of-spine, and in a way intended by the original
OEBPS authors. In its implementation of OEBPS (Microsoft Reader), an
"out-of-spine" content document is rendered in a pagelet, much like a
"super popup window" -- it's an innovative and powerful feature (and
one which Microsoft curiously never promoted, and I don't even recall
it being mentioned in the several MS Reader manuals that I've
consulted!)
As an ebook publisher in the LIT format, I use the Reader
"out-of-spine"/pagelet feature in all of my OEBPS Publications (which
are then converted to LIT) and have received many positive comments as
to the utility of the pagelet facility enabled by OEBPS out-of-spine
content (I place all notes, auxiliary content, and full-size images in
"out-of-spine".)
As an ebook publisher, I can't imagine NOT having the "out-of-spine"
feature, which is one reason I have not been motivated to publish my
books in the other standard "linear" formats.
And since it will also be brought up whether anyone besides myself is
creating LIT files with pagelets, this is being done not infrequently
according to one fairly major conversion house I spoke with.
The issue we now face is whether we should affirm the "out-of-spine"
feature in "OEBPS-next", or remove it entirely. I see no middle
ground since there should not be any middle ground on something as
critical and important as this.
I don't believe a strong argument can be made to remove it. And I
don't believe that a statement saying "reading systems should support
out-of-spine" (or similar statement) is sufficient to guide reading
system developers, who, if past history is any indication (with
Microsoft being the notable exception), will ignore the "should"
because it is inconvenient for them (I've studied how user agents may
present "out-of-spine" content, and it is NOT a burden on user agent
developers by any stretch of the imagination.)
This is not an issue of inconvenience to reading system developers,
but of what is best for the long-term growth of the digital
publication industry. Remove the out-of-spine feature, and this will
setback the digital publication industry by several years.
What I Propose
--------------
So, here's what I propose that we add to OEBPS-next while keeping
the rest of what we say about the Spine, the Manifest, etc., intact
(focus on the meaning of what is being said below, and not on the
specese which I know Garth can certainly improve!):
"Reading Systems *must* render, upon user demand, a content
document declared in the Manifest when referenced, directly or
indirectly by chained hypertext links, via:
a) Spine,
b) Tours [assuming we continue support for Tours],
c) Guide [assuming we continue support for Guides],
d) Another content document declared in the Manifest and
similarly referenced, and
e) Navigation Set.
The exception is a content document which is not accessible to the
end-user because of rights or network restrictions."
[see postscript at end of this message]
Note the focus on *all* documents in the Manifest, including both
Spine and "out-of-spine". Clearly we should state this general rule,
and it should apply to *all* content documents in the Manifest with no
restrictions of any kind. If a content document is reachable via
hypertext links, directly or indirectly, from the initial entry into
the Spine, then reading systems *must* render it (except when
inaccessible to the end-user as noted.)
The nice thing about this statement is that it allows us to expand
OEBPS support for non-linear content in the future using other
mechanisms, such as the "web paradigm."
So, that is the proposal I am submitting to OEBPS WG. And of course if
I do publish a more general essay on this topic on TeleRead (which is
the most popular ebook blog on the Internet), I will further address
the why's, and the advantages, of "out-of-spine" content to both
publishers and end-users.
Note that the needs or convenience of reading system developers does
not trump what should be the right thing the spec needs to promote,
especially in that implementing "out-of-spine" feature is actually
relatively easy for user agent developers, as my study of this issue
has led me to conclude. (E.g., if Mozilla is used for rendering,
place "out-of-spine" content into a separate browser window with
particular window parameters. I've even thought how one could
implement "out-of-spine" rendering in a limited PDA platform, and it
is doable in a way the reader knows it is "out-of-spine".)
Jon Noring
[p.s. Certainly by the proposed statement a Reading System which
controls the authoring of OEBPS Publications submitted to it can
decide whether or not to support the OEBPS "out-of-spine" feature
since it controls what is input to it. But why would they want to do
this when they can easily support "out-of-spine" and provide a better
reading experience for end-users? It certainly works against the
purpose of "out-of-spine" and the long-term benefit that representing
non-linear publications has for everyone in the digital publication
industry.
And of course what would the Reading System vendor tell publishers who
are building general-purpose OEBPS Publications for Reading Systems
that support "out-of-spine" -- that the publisher "has to linearize"
their OEBPS just for them? As a publisher, I would not like this --
I wonder about *their* competency. And telling publishers they should
never place "out-of-spine" content in the Manifest because [insert a
reason here...] is equally troubling -- I call it a thumbing one's
nose at the spirit and goals of the OEBPS specification which is
striving to bring both conformity and advanced features to the digital
publication universe.
We must include "out-of-spine" in the spec, and do what we can to
encourage publishers to use it, and Reading System developers to
support it. This cannot be accomplished by keeping "out-of-spine"
in the closet as it has been up to now. Giving requirement levels
such as "should" does not meet the muster of the right thing IDPF
should do -- all it does is to continue to confuse the issue for
everyone -- it is effectively the same as removing "out-of-spine"
support entirely.]
Nick Bogaty wrote:
>Dear eBook Community List,
>
>The IDPF's Unified OEBPS Container Format Working Group released a working
>draft specification, OEBPS Container Format (OCF) 0.6, for public
>distribution today. The OEBPS Container Format (OCF) 0.6 working draft
>specification can be found at:
>
>http://www.idpf.org/doc_library/informationaldocs.htm
>
>
As many of you know, the IDPF, née OEBF, has recently started an effort
to create a single-file encapsulation format for e-books conforming to
the Open eBook Publication Structure
(http://www.idpf.org/oebps/oebps1.2/index.htm). The goals of the format
are two-fold: to provide a single file that can encapsulate all the
disparate files and formats that make up an OEBPS publication, and to do
so in a way that a conforming publication can be rendered on any User
Agent that supports the format. Needless to say, some of the guiding
principles behind the effort are that the format should, to the greatest
extent possible, use existing technologies; be open for anyone to study
and use, without restrictions; and be free from claims of intellectual
property.
As noted above by Mr. Bogaty, a first draft of the specification has
been published for review and comment. While the IDPF considers itself a
"standards organization", it has had little experience facilitating
public input on its proposed specifications, and unfortunately does not
yet have any effective means for accepting feedback or promoting public
discussion.
On April 25 Bill McCoy blogged about the new container specification,
and has graciously allowed us to use the comments section of that blog
entry to discuss the new specification. I would encourage everyone who
is interested in e-book formats to review the draft specification and
post any comments you may have, whether about the technical details of
the specification or about the presentation of those details, as
comments to Bill's blog entry at
http://blogs.adobe.com/billmccoy/2006/04/open_container.html.
Hello Jon,
thanks for the answer. I guess you will send out an e-mail as soon as
your sample(s) is/are downloadable, right?
I also plan to publish all of my e-book editions in the open reader
format. As mentioned earlier I have about 230 titles and all of them
have bee already converted/published in the OEBPS format for the
earlier Gemstar readers and today we still convert our titles with
that old software("Gemstar eBook Publisher") prior to downgrading
them to the Mobipocket format. Till this very day the "Gemstar eBook
Publisher" software is very usefull and I hope that OSoft is not just
working on their DOTREADER but also on PlugIns for the convenience of
conversion. Do you have any information on this?
Best Regards
eBookMedia.de - Jörg Morgan Wrobel
Am 17.05.2006 um 21:52 schrieb Jon Noring:
> Jörg Morgan Wrobel asked:
>
> > I would like to ask you if there will be a sample eBook edition in
> > the OpenReader Format, wich everybody can download from your website
> > and use as a kind of guideline or template to create own eBook
> > editions. This might be very helpfull for beginners. I do not think
> > that anyone would not welcome one or maybe some. To be really
> usefull
> > the sample file should have examples of all possibilities of the
> > OpenReader format.
>
> The answer is an emphatic yes!
>
> The first OpenReader format sample I will put together will likely be
> "My Antonia". Since I want to demonstrate the feature of out-of-spine
> content, plus the OpenReader namespace special elements, such as page
> break and line break, I plan to ask to use Jose Menendez' version,
> rather than the one I've been working on as a demonstration
> illustrating some principles (mine does not include line breaks from
> the original book, and Jose has added corrections, including a few not
> previously recognized by Cather scholars, that thrive in the
> OpenReader framework -- OpenReader is built to handle non-linear
> content.) I'd like to ask for help with CSS styling for this book, and
> this includes supporting multiple Style Sets, so we're not restricted
> to just one particular presentation.
>
> I hope to also take someone's XML-standards web site and put that into
> OpenReader, since OpenReader is also designed to represent
> publications using the web paradigm. Any ideas? The documents must
> conform to the Basic Content Document vocabulary, and JavaScript and
> other like stuff is not supported.
>
> Bill Janssen recommended I find a children's book with lots of
> illustrations and put that into OpenReader. I have a few people I can
> ask to provide material, or I can glean it from PG/DP.
>
> And OSoft, along with its strategic partners, as well as LibraryCity,
> is planning to publish a lot of books and documents in the OpenReader
> format.
>
> I hope to begin work on My Antonia after I return from Book Expo early
> next week.
>
> Jon
>
>
>
>
> ----------------------------------------------------
> Post a message: openreader-format@yahoogroups.com
> Unsubscribe: openreader-format-unsubscribe@yahoogroups.com
> Switch to digest: openreader-format-digest@yahoogroups.com
> Switch to normal: openreader-format-normal@yahoogroups.com
> Put mail on hold: openreader-format-nomail@yahoogroups.com
> Administrator: openreader-format-owner@yahoogroups.com
> ----------------------------------------------------
>
>
> YAHOO! GROUPS LINKS
>
> Visit your group "openreader-format" on the web.
>
> To unsubscribe from this group, send an email to:
> openreader-format-unsubscribe@yahoogroups.com
>
> Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
>
>
[Non-text portions of this message have been removed]
Everyone,
I'm embarrassed to say that I lost the email, the CSS style sheet, and
most importantly the name of the person subscribed to this group who sent
me a nice style sheet for the OpenReader specs back in December or so. I
got sidetracked working on the spec, and displaced everything.
If you're reading this, get back to me in private. Of course, look at
the latest specs at http://openreader.org/spec/ to see if you need to
update your style sheet.
Thanks, and sorry!
Jon Noring
On Wednesday, May 17, 2006, 9:05:09 PM, Jon wrote:
JN> Michael wrote:
>> Jon Noring wrote:
>>> However, here is what the XML 1.0 spec does say for production
>>> 44:
>>>
>>> [44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>'
>>>
>>> "Empty-element tags MAY be used for any element which has no content,
>>> whether or not it is declared using the keyword EMPTY. For
>>> interoperability, the empty-element tag SHOULD be used, and SHOULD
>>> only be used, for elements which are declared EMPTY."
>>>
>>> Reference: http://www.w3.org/TR/REC-xml/#NT-EmptyElemTag
>>>
>>>
>>> So, XML makes a pretty strong statement here ("SHOULD"), even if not
>>> required.
>> But note the definition of "for interoperability"
>> (http://www.w3.org/TR/REC-xml/#dt-interop):
>> "a non-binding recommendation included to increase the chances that
>> XML documents can be processed by the existing installed base of SGML
>> processors which predate the WebSGML Adaptations Annex to ISO 8879.]"
>> How important is it that old SGML processors be able to handle the
>> OpenReader format?
JN> Thanks. I should have checked to see how XML defined the phrase "for
JN> interoperability."
Given that such "old SGML processors" will not handle namespaces, either; and
will require an SGML declaration, and will require a DTD to be both present and
to completely describe all elements and attributes - and given that actual real
SGML processors are now updated to the new SGML version which is a superst of
XML1.0 - not very.
This was a big deal in 1997-1998. Its not anymore.
JN> I'm close to relenting on this, given the excellent feedback by
JN> several here (csssite, Chris and Michael.) However, one last gasp
JN> before I throw in the towel on this one:
JN> The issue is states as:
JN> Should we REQUIRE that: *publication authors* use the empty element
JN> syntax (e.g., <img/>) for all elements in both Binder and content
JN> documents that are declared EMPTY in the associated DTDs. And, when an
JN> element, which is not declared EMPTY, has no content, that it must not
JN> follow the empty element syntax (i.e., it must be <p></p>.) ???
So, precluding for example the XML being generated on the fly by XSLT, produced
by a servlet, or created on the fly from a database by XML Query, or ....
JN> Importantly note that this requirement says nothing about XML
JN> processing -- only that publication authors are required to be good.
There are not always humans involved in the output stage.
JN> So it will have no impact on user agents, but it may have an impact
JN> on "publication conformance checkers" (which I do not consider to be
JN> user agents
(agreed, in passing)
JN> -- a discussion for a different thread.)
JN> The advantages I see are admittedly sparse (others are welcome to add
JN> to this list):
JN> 1) It forces consistency in element usage.
JN> 2) It meets the "interoperability" standard of XML, even if dated as
JN> Michael notes.
No, what XML said was not to use, for example, <p/> for a paragraph with no
text.
JN> The downside is that it is one more restriction on authoring markup
JN> practice, albeit a very minor one.
JN> It also is an added item for "publication conformance checkers" to
JN> have to check.
JN> So, given the above, is there anyone who stands on the side of
JN> requirement? If not, I will relent and rewrite section 3.3.5 of the
JN> Binder and Basic Content Documents to follow the XML 1.0 spec when it
JN> recommends rather than requires it.
JN> (One aspect of this are existing XML processing and rendering tools.
JN> Are they all pretty agnostic between <foo/> and <foo></foo> regardless
JN> of whether or not the elements are declared EMPTY?)
Yes.
JN> Jon
JN> ----------------------------------------------------
JN> Post a message: openreader-format@yahoogroups.com
JN> Unsubscribe: openreader-format-unsubscribe@yahoogroups.com
JN> Switch to digest: openreader-format-digest@yahoogroups.com
JN> Switch to normal: openreader-format-normal@yahoogroups.com
JN> Put mail on hold: openreader-format-nomail@yahoogroups.com
JN> Administrator: openreader-format-owner@yahoogroups.com
JN> ----------------------------------------------------
JN> Yahoo! Groups Links
JN>
--
Chris Lilley mailto:chris@...
Interaction Domain Leader
Chair, W3C SVG Working Group
W3C Graphics Activity Lead
Co-Chair, W3C Hypertext CG
Jörg Morgan Wrobel asked:
> I would like to ask you if there will be a sample eBook edition in
> the OpenReader Format, wich everybody can download from your website
> and use as a kind of guideline or template to create own eBook
> editions. This might be very helpfull for beginners. I do not think
> that anyone would not welcome one or maybe some. To be really usefull
> the sample file should have examples of all possibilities of the
> OpenReader format.
The answer is an emphatic yes!
The first OpenReader format sample I will put together will likely be
"My Antonia". Since I want to demonstrate the feature of out-of-spine
content, plus the OpenReader namespace special elements, such as page
break and line break, I plan to ask to use Jose Menendez' version,
rather than the one I've been working on as a demonstration
illustrating some principles (mine does not include line breaks from
the original book, and Jose has added corrections, including a few not
previously recognized by Cather scholars, that thrive in the
OpenReader framework -- OpenReader is built to handle non-linear
content.) I'd like to ask for help with CSS styling for this book, and
this includes supporting multiple Style Sets, so we're not restricted
to just one particular presentation.
I hope to also take someone's XML-standards web site and put that into
OpenReader, since OpenReader is also designed to represent
publications using the web paradigm. Any ideas? The documents must
conform to the Basic Content Document vocabulary, and JavaScript and
other like stuff is not supported.
Bill Janssen recommended I find a children's book with lots of
illustrations and put that into OpenReader. I have a few people I can
ask to provide material, or I can glean it from PG/DP.
And OSoft, along with its strategic partners, as well as LibraryCity,
is planning to publish a lot of books and documents in the OpenReader
format.
I hope to begin work on My Antonia after I return from Book Expo early
next week.
Jon
Hello Jon
I would like to ask you if there will be a sample eBook edition in
the OpenReader Format, wich everybody can download from your website
and use as a kind of guideline or template to create own eBook
editions. This might be very helpfull for beginners. I do not think
that anyone would not welcome one or maybe some. To be really usefull
the sample file should have examples of all possibilities of the
OpenReader format.
Thank you for your answers and best regards from...
eBook Media - Jörg Morgan Wrobel
http://www.ebookmedia.de
Am 17.05.2006 um 20:43 schrieb Jon Noring:
> [oops, intended to send this to the group, not to Chris. So am
> resending it to openreader-format. Jon]
>
>
> Chris Lilley wrote:
> > Jon wrote:
>
> >> So, XML makes a pretty strong statement here ("SHOULD"), even if
> not
> >> required.
>
> > Yes,the SHOULD should be respected.
>
>
> >> Second, I strongly believe it important in OpenReader to encourage
> >> consistent practice in authoring content documents (as well as the
> >> Binder document).
> >>
> >> Thus, I have no difficulty in elevating the XML "SHOULD" to a
> "MUST"
> >> in OpenReader.
>
> > Please don't.
> >
> > If its XML syntax, accept XML syntax, in its entirety, without extra
> > hoops to jump through. Its perfectly fine to suggest that authoring
> > software should use <foo/> for a given empty element. But receiving
> > software must still be able to parse <foo></foo> and generating
> software
> > may on occasion generate it that way.
> >
> > Please do. Feel free to state a preference for how to author, but
> please
> > don't make conformant xml non-conformant with your spec; an XML
> > validator will not catch this.
>
> So far the majority consensus of those who have spoken up is to
> restore the XML default of SHOULD rather than MUST as it currently
> stands in the Binder section 3.3.5:
>
> http://openreader.org/spec/bnd10.html#sec3.3.5
>
> Stating it another way, does anyone see a strong, compelling reason to
> elevate the XML SHOULD to a MUST for OpenReader?
>
> Jon
>
>
>
> ----------------------------------------------------
> Post a message: openreader-format@yahoogroups.com
> Unsubscribe: openreader-format-unsubscribe@yahoogroups.com
> Switch to digest: openreader-format-digest@yahoogroups.com
> Switch to normal: openreader-format-normal@yahoogroups.com
> Put mail on hold: openreader-format-nomail@yahoogroups.com
> Administrator: openreader-format-owner@yahoogroups.com
> ----------------------------------------------------
>
>
> YAHOO! GROUPS LINKS
>
> Visit your group "openreader-format" on the web.
>
> To unsubscribe from this group, send an email to:
> openreader-format-unsubscribe@yahoogroups.com
>
> Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
>
>
[Non-text portions of this message have been removed]