Jörg Morgan Wrobel asked:
> I would like to ask you if there will be a sample eBook edition in
> the OpenReader Format, wich everybody can download from your website
> and use as a kind of guideline or template to create own eBook
> editions. This might be very helpfull for beginners. I do not think
> that anyone would not welcome one or maybe some. To be really usefull
> the sample file should have examples of all possibilities of the
> OpenReader format.
The answer is an emphatic yes!
The first OpenReader format sample I will put together will likely be
"My Antonia". Since I want to demonstrate the feature of out-of-spine
content, plus the OpenReader namespace special elements, such as page
break and line break, I plan to ask to use Jose Menendez' version,
rather than the one I've been working on as a demonstration
illustrating some principles (mine does not include line breaks from
the original book, and Jose has added corrections, including a few not
previously recognized by Cather scholars, that thrive in the
OpenReader framework -- OpenReader is built to handle non-linear
content.) I'd like to ask for help with CSS styling for this book, and
this includes supporting multiple Style Sets, so we're not restricted
to just one particular presentation.
I hope to also take someone's XML-standards web site and put that into
OpenReader, since OpenReader is also designed to represent
publications using the web paradigm. Any ideas? The documents must
conform to the Basic Content Document vocabulary, and JavaScript and
other like stuff is not supported.
Bill Janssen recommended I find a children's book with lots of
illustrations and put that into OpenReader. I have a few people I can
ask to provide material, or I can glean it from PG/DP.
And OSoft, along with its strategic partners, as well as LibraryCity,
is planning to publish a lot of books and documents in the OpenReader
format.
I hope to begin work on My Antonia after I return from Book Expo early
next week.
Jon
Hello Jon
I would like to ask you if there will be a sample eBook edition in
the OpenReader Format, wich everybody can download from your website
and use as a kind of guideline or template to create own eBook
editions. This might be very helpfull for beginners. I do not think
that anyone would not welcome one or maybe some. To be really usefull
the sample file should have examples of all possibilities of the
OpenReader format.
Thank you for your answers and best regards from...
eBook Media - Jörg Morgan Wrobel
http://www.ebookmedia.de
Am 17.05.2006 um 20:43 schrieb Jon Noring:
> [oops, intended to send this to the group, not to Chris. So am
> resending it to openreader-format. Jon]
>
>
> Chris Lilley wrote:
> > Jon wrote:
>
> >> So, XML makes a pretty strong statement here ("SHOULD"), even if
> not
> >> required.
>
> > Yes,the SHOULD should be respected.
>
>
> >> Second, I strongly believe it important in OpenReader to encourage
> >> consistent practice in authoring content documents (as well as the
> >> Binder document).
> >>
> >> Thus, I have no difficulty in elevating the XML "SHOULD" to a
> "MUST"
> >> in OpenReader.
>
> > Please don't.
> >
> > If its XML syntax, accept XML syntax, in its entirety, without extra
> > hoops to jump through. Its perfectly fine to suggest that authoring
> > software should use <foo/> for a given empty element. But receiving
> > software must still be able to parse <foo></foo> and generating
> software
> > may on occasion generate it that way.
> >
> > Please do. Feel free to state a preference for how to author, but
> please
> > don't make conformant xml non-conformant with your spec; an XML
> > validator will not catch this.
>
> So far the majority consensus of those who have spoken up is to
> restore the XML default of SHOULD rather than MUST as it currently
> stands in the Binder section 3.3.5:
>
> http://openreader.org/spec/bnd10.html#sec3.3.5
>
> Stating it another way, does anyone see a strong, compelling reason to
> elevate the XML SHOULD to a MUST for OpenReader?
>
> Jon
>
>
>
> ----------------------------------------------------
> Post a message: openreader-format@yahoogroups.com
> Unsubscribe: openreader-format-unsubscribe@yahoogroups.com
> Switch to digest: openreader-format-digest@yahoogroups.com
> Switch to normal: openreader-format-normal@yahoogroups.com
> Put mail on hold: openreader-format-nomail@yahoogroups.com
> Administrator: openreader-format-owner@yahoogroups.com
> ----------------------------------------------------
>
>
> YAHOO! GROUPS LINKS
>
> Visit your group "openreader-format" on the web.
>
> To unsubscribe from this group, send an email to:
> openreader-format-unsubscribe@yahoogroups.com
>
> Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
>
>
[Non-text portions of this message have been removed]
Michael wrote:
> Jon Noring wrote:
>> However, here is what the XML 1.0 spec does say for production
>> 44:
>>
>> [44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>'
>>
>> "Empty-element tags MAY be used for any element which has no content,
>> whether or not it is declared using the keyword EMPTY. For
>> interoperability, the empty-element tag SHOULD be used, and SHOULD
>> only be used, for elements which are declared EMPTY."
>>
>> Reference: http://www.w3.org/TR/REC-xml/#NT-EmptyElemTag
>>
>>
>> So, XML makes a pretty strong statement here ("SHOULD"), even if not
>> required.
> But note the definition of "for interoperability"
> (http://www.w3.org/TR/REC-xml/#dt-interop):
>
> "a non-binding recommendation included to increase the chances that
> XML documents can be processed by the existing installed base of SGML
> processors which predate the WebSGML Adaptations Annex to ISO 8879.]"
>
> How important is it that old SGML processors be able to handle the
> OpenReader format?
Thanks. I should have checked to see how XML defined the phrase "for
interoperability."
I'm close to relenting on this, given the excellent feedback by
several here (csssite, Chris and Michael.) However, one last gasp
before I throw in the towel on this one:
The issue is states as:
Should we REQUIRE that: *publication authors* use the empty element
syntax (e.g., <img/>) for all elements in both Binder and content
documents that are declared EMPTY in the associated DTDs. And, when an
element, which is not declared EMPTY, has no content, that it must not
follow the empty element syntax (i.e., it must be <p></p>.) ???
Importantly note that this requirement says nothing about XML
processing -- only that publication authors are required to be good.
So it will have no impact on user agents, but it may have an impact
on "publication conformance checkers" (which I do not consider to be
user agents -- a discussion for a different thread.)
The advantages I see are admittedly sparse (others are welcome to add
to this list):
1) It forces consistency in element usage.
2) It meets the "interoperability" standard of XML, even if dated as
Michael notes.
The downside is that it is one more restriction on authoring markup
practice, albeit a very minor one.
It also is an added item for "publication conformance checkers" to
have to check.
So, given the above, is there anyone who stands on the side of
requirement? If not, I will relent and rewrite section 3.3.5 of the
Binder and Basic Content Documents to follow the XML 1.0 spec when it
recommends rather than requires it.
(One aspect of this are existing XML processing and rendering tools.
Are they all pretty agnostic between <foo/> and <foo></foo> regardless
of whether or not the elements are declared EMPTY?)
Jon
[oops, intended to send this to the group, not to Chris. So am
resending it to openreader-format. Jon]
Chris Lilley wrote:
> Jon wrote:
>> So, XML makes a pretty strong statement here ("SHOULD"), even if not
>> required.
> Yes,the SHOULD should be respected.
>> Second, I strongly believe it important in OpenReader to encourage
>> consistent practice in authoring content documents (as well as the
>> Binder document).
>>
>> Thus, I have no difficulty in elevating the XML "SHOULD" to a "MUST"
>> in OpenReader.
> Please don't.
>
> If its XML syntax, accept XML syntax, in its entirety, without extra
> hoops to jump through. Its perfectly fine to suggest that authoring
> software should use <foo/> for a given empty element. But receiving
> software must still be able to parse <foo></foo> and generating software
> may on occasion generate it that way.
>
> Please do. Feel free to state a preference for how to author, but please
> don't make conformant xml non-conformant with your spec; an XML
> validator will not catch this.
So far the majority consensus of those who have spoken up is to
restore the XML default of SHOULD rather than MUST as it currently
stands in the Binder section 3.3.5:
http://openreader.org/spec/bnd10.html#sec3.3.5
Stating it another way, does anyone see a strong, compelling reason to
elevate the XML SHOULD to a MUST for OpenReader?
Jon
On Wednesday, May 17, 2006, 5:46:50 PM, Jon wrote:
JN> So, XML makes a pretty strong statement here ("SHOULD"), even if not
JN> required.
Yes,the SHOULD should be respected.
JN> Second, I strongly believe it important in OpenReader to encourage
JN> consistent practice in authoring content documents (as well as the
JN> Binder document).
JN> Thus, I have no difficulty in elevating the XML "SHOULD" to a "MUST"
JN> in OpenReader.
Please don't.
If its XML syntax, accept XML syntax, in its entirety, without extra
hoops to jump through. Its perfectly fine to suggest that authoring
software should use <foo/> for a given empty element. But receiving
software must still be able to parse <foo></foo> and generating software
may on occasion generate it that way.
As an example of a bad decision that makes perfectly legal XML
non-conformant, XHTML 1.0 has this silly stuff about placing a space
before the / for some empty elements (eg <br /> not <br/>. This is a
royal pita because a conformant xml output engine has to be specially
filtered to make it do this, which is allowed but not required by the
XML spec. Or, in practice, just ignore the silly rule.
JN> The burden on user agent developers, conformance
JN> checking tool developers, and publication authors is quite minimal.
I disagree. It gives generating tools an extra hoop to jump through.
JN> (Now if this were a major burden on any of these stakeholders, then we
JN> have to be more careful when setting "MUST" requirements.)
JN> So, what does everyone else think? Should we reset the "MUST" to a
JN> "SHOULD" in Section 3.3.5 to reflect exactly what XML 1.0 says?
Please do. Feel free to state a preference for how to author, but please
don't make conformant xml non-conformant with your spec; an XML
validator will not catch this.
--
Chris Lilley mailto:chris@...
Interaction Domain Leader
Chair, W3C SVG Working Group
W3C Graphics Activity Lead
Co-Chair, W3C Hypertext CG
csssite wrote:
> Note that <element/> and <element></element> are equivalent
> representations of empty element (i.e. element that has no content
> and that may or may not be declared in DTD as EMPTY).
> <br></br> and <script type="application/ecmascript" src="script.js"/>
> are both allowed according to XML 1.0 recomendation. In overall it is
> better to remove comments like this that are not relevant to Binding
> specification.
>
> Quote from http://openreader.org/spec/bnd10.html#sec3.3.5
> "Some elements in a DTD may be declared EMPTY. When used in an XML
> document, these elements must not
> contain any content and must use the empty-element syntax (also known
> as 'minimized form') as specified in XML 1.0."
Thanks, csssite, for taking the time to read through the spec
thoroughly and provide feedback. It's good to see several people do
so.
The Binder section csssite quoted from, Section 3.3, is a sort of
informative section which is also duplicated in the Basic Content
Document 1.0 spec ( http://openreader.org/spec/bcd10.html ). I've
been thinking of moving that whole section to another document,
probably called a 'note' rather than a spec following W3C practice.
But for the time being the redundancy is being kept simply because
of time limitations.
csssite is right. XML does not require what Section 3.3.5 of the Binder
requires. However, here is what the XML 1.0 spec does say for production
44:
[44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>'
"Empty-element tags MAY be used for any element which has no content,
whether or not it is declared using the keyword EMPTY. For
interoperability, the empty-element tag SHOULD be used, and SHOULD
only be used, for elements which are declared EMPTY."
Reference: http://www.w3.org/TR/REC-xml/#NT-EmptyElemTag
So, XML makes a pretty strong statement here ("SHOULD"), even if not
required.
Second, I strongly believe it important in OpenReader to encourage
consistent practice in authoring content documents (as well as the
Binder document).
Thus, I have no difficulty in elevating the XML "SHOULD" to a "MUST"
in OpenReader. The burden on user agent developers, conformance
checking tool developers, and publication authors is quite minimal.
(Now if this were a major burden on any of these stakeholders, then we
have to be more careful when setting "MUST" requirements.)
So, what does everyone else think? Should we reset the "MUST" to a
"SHOULD" in Section 3.3.5 to reflect exactly what XML 1.0 says?
Jon
Ben Trafford wrote:
> Jon Noring wrote:
>>I encourage everyone to read through the Binder spec and note any
>>issues you have with it, and post them here to openreader-format for
>>discussion.
Ben, thanks for your thorough reading of the Binder spec and taking
the time to post your thoughts and concerns. I'll provide some brief
answers to some of your points, and hope that others chime in with
their thoughts.
> Throughout the specification, you use short-forms for
> attribute names...for example, "resid" instead of "resource_id" or
> "ResourceID." I don't see this as an authoring simplification, since
> most authoring will either use XML editors that will insert the
> attributes automatically, or will be electronically converted from
> other sources. The Binder documents will be complex enough; why muddy
> the waters will unreadable names? Why not use full names for all
> markup, and make the Binders more human-readable?
Good point. Being one who hand-authors a lot of XML documents, one
requirement I had established was that attributes need to be short in
length, but at least descriptive enough to be understandable without a
lot of mental effort. It seems like maybe some of the attributes are
too short.
What do others think?
Human readability (which includes understanding the structure of the
content data) is definitely important. I notice that quite a few XML
applications make the assumption that the XML will always be "under
the hood", so many end up being difficult for human beings to read and
comprehend. It doesn't help when there are multiple namespaces all
blended together. It gets to be 'colon cancer.'
An example of complexity is the XML serialization of RDF, which is
what PSWG had decided back in 2002 to use for the OEBPS 2.0 Package
(which never got finished.) Btw, there are some who would like to see
the Binder reformulated in RDF, and there are a couple arguments that
can be made in that direction. I welcome everyone's input on the topic
of RDF for all or part of the Binder.
> In section 4.3.3., why use URNs and URIs, which include both
> URLs and URNs? I don't understand the need for the limitation.
Hmmm, you'll have to elaborate on this one. Sorry. (It may be an
apples/oranges thing, maybe something else.)
> Section 4.4 could use some simplifying and expansion. To old
> hypertext geeks like myself, a resource is the best way to describe
> something. But you really ought to provide a definition and examples
> of what is a resource, for the non-hypertext people.
Hmmm, yes. I have seen the word 'resources' used as you describe.
I assume you have no difficulty with using the word 'resource' in this
context so long as it is rigorously defined?
Or should an adjective be placed in front of 'resources' to fine tune
its meaning throughout the Binder spec?
Any suggestions for either a rigorous definition of 'resource' or an
adjective to place in front of it?
> Section 4.6.2. could do with an extensibility mechanism.
> Surely, there will be other forms of usage than OEB and Web -- print,
> for example. Couldn't you define an "other," and then a sub-element
> within the UserSet to define what that other is, with fallbacks to
> OEB or Web? Yes, this adds a level of complexity, but if you state
> the user agent is only required to use the fallback, it would be
> forward-thinking.
Interesting.
I recall back in 1999/2000 during the original OEBPS discussion about
adding extensibility mechanisms, and the debate that ensued where this
may lead to abuse/proprietization of the spec by the 800# gorillas.
Although OEBPS has not been so hijacked, I did observe troubling
developments where the extensibility mechanism was used for user agent
features/functionality that OEBPS should have enabled in a more
universal manner (e.g., Microsoft's extensibility for adding covers
and thumbnails, and its odd extension of Tours which didn't do
anything above and beyond what OEBPS Tours already defined.)
My view is that at least for the first edition of OpenReader, to close
most if not all extensibility doors, and let user agent developers and
others in the digital publication "ecosystem" come to the working
group and ask for some feature/function they can't enable using the
current spec. This way the function/feature may be enabled in a way
which benefits everyone, and makes sure it is constrained so it does
not give undue advantage to one vendor over another.
Looking at it another way, it is better to be conservative from the
start, and then "open up" in a controlled fashion as needed. It is
easy to add a feature/function to the spec, but removing a feature or
function when it proves to be problematic (especially if it leads to
"balkanization" of publications and user agents), is much more
difficult because removing it may break existing publications using
what was previously allowed. It's like having to deal with a bad
haircut -- it hangs around for quite a while.
Regarding Ben's specific point of adding "other." to the list of
presentation modes. This is especially troubling to me in that it is
an extensibility mechanism for the quite fundamental (and important)
rendering/presentation of publications. If a proprietary vendor wanted
to hijack the spec, this is a major point of attack. Maybe I'm being a
little too paranoid, but as noted above, we need to be careful not to
open up the door too much. Rather than just fully opening the door
without peeking behind it first, I think we keep the door closed until
there is a real need that we open it, and then we do so slowly, peeking
through the cracks to see what's behind it.
(Btw, I am intrigued with Ben's suggestion for a 'print' mode. This is
already addressed, to a certain extent, in the Linear functional part,
see Section 4.10: http://openreader.org/spec/bnd10.html#sec4.10 and in
the Styling Set media setting of 'print', see Section 4.16.3,
http://openreader.org/spec/bnd10.html#sec4.16.3 ) . But I am aware of
the need for high-quality printing of digital publications. The Prince
PDF application, developed by Michael Day's company YesLogic, is one
example.)
> In sections 4.9 through 4.15, I think the use of
> 'residrefs="intro chap1 chap2"' is clumsy. Processing outside the XML
> paradigm to look for spaces in an attribute value is problematic. Why
> not break it down into separate elements?
('residrefs' is described in Section 4.2.3 of the Binder spec:
http://openreader.org/spec/bnd10.html#sec4.2.3 )
Hmmm, I really thought it to be "anti-clumsy," but to each their own.
Here's some of my thoughts:
As you no doubt know, the datatype for 'residrefs' is IDREFS, which is
specifically recognized in XML as a white space separated list of one
or more IDREF values. See http://www.w3.org/TR/REC-xml/#idref
From a readability perspective, 'residrefs' is used to reference
multiple resource IDs for some object. If we switched to multiple
elements to assign the same information, we end up with a redundant
sequence of elements, which bloats the Binder document and makes
readability more difficult.
The real question is about user agent processing. How difficult is it
for a user agent developer to write code to parse the space separated
values in 'residrefs'? I believe it is pretty trivial, actually,
although I'd like to hear from developers (Gary?) Machine parsing is
aided in that all XML processors must normalize IDREFS attribute
values before passing them on to the application, as noted in the XML
spec:
http://www.w3.org/TR/REC-xml/#AVNormalize
So what does everyone else think? Should the IDREFS datatype never be
used in XML documents, or not used for the purpose of 'residrefs' as
defined in http://openreader.org/spec/bnd10.html#sec4.2.3 ?
Ben (and others), in the XML community, is it now considered bad form
to use an attribute having datatype IDREFS? Or does it depend upon the
application? Is IDREFS considered something that should not have been
supported by XML in the first place?
> In section 4.12, the idea of "element substitution" is
> confusing, especially for people who are coming at this from an
> electronic publishing, and not an XML, perspective. Why not call it
> what it is, which is "content substitution?" It'd be much clearer, I
> think.
Yes, "element substitution" is clumsy, but the dilemma I was facing is
that the Binder has to enable two types of "content substitutions"
(where my use of 'content' here is beyond just text content):
1) Substitute the contents of an element in a content document (where
'content document' is defined to be an XML document containing
textual content.)
2) Allow user agents to substitute non-text media resources, such as
images, video, and audio, with other non-text media equivalents if
provided.
So one is element-based, the other is resource-based. Initially I had
combined the two functions into one functional part called
'substitution', but it really got clumsy since it had two parts, each
of which used its own syntax.
Anyway, open to ideas here...
> In section 4.15, the idea of "thumbnails" is fine, but
> couldn't it be expanded? For example, couldn't I provide a snippet of
> music, instead of the whole MP3? A thumbnail is really a sample of a
> given resource; why not define it as such?
As I was mulling over the 'thumbnail' capability, its use and who
would use it, and its relation to "cover art" (which may include video,
audio, etc.), I decided that enough doubt existed as to be cautious
and restrict thumbnails for the time being to raster images.
As previously noted, it is easy to expand resource type support for
thumbnails, but difficult to remove support if we are overbroad from
the start. In the case of thumbnails, I'd like to see how publication
authors, user agent developers, and retailers will use thumbnails, and
get their feedback of what they'd like if not already provided.
> In section 4.1.6, why limit the stylesheet selection to CSS?
> Why not XSLT and XSL-FOs?
I probably need to make it even more clear in the Binder spec that the
Binder does NOT specify any particular styling mechanism. It uses CSS
2.1 for examples only. The OpenReader Publication Framework 1.0 spec
will specify CSS 2.1 support (and of course in the future styling
language support could be expanded as needed following the philosophy
of being conservative and then opening up the door in controlled
fashion.)
Anyway, here's what I current say in the Section 4.16 introduction.
Note especially the last paragraph:
(from http://openreader.org/spec/bnd10.html#sec4.16 )
"The optional Styling, a functional part of the User Set, may be
used to apply cascade-type styling (such as CSS) to the content
document resources of the User Set.
"The Styling functional part is an innovative feature not
implemented in prior generation XML-based publication frameworks,
such as OEBPS. Instead of having styling instructions haphazardly
spread within all the content documents and in "style sheets"
remotely referenced by individual content documents, the Styling
functional part centralizes, for the Publication author, all the
styling into a unified and powerful framework within the Binder
Document. This gives Publication authors a greater degree of
flexibility, simplicity, and control over Publication styling.
"The overarching publication framework which references this
specification defines the cascade-type styling language(s)
supported, and the role, if any, that embedded and referenced
styling within content documents is given, including the priority
of applying the styling information."
> In section 4.17, you discuss the use of pointers? Yay for
> pointers! But why not use XPointer for the #pointer? Includes simple
> Web-style mechanisms as well as advanced pointing. Also, where is
> XInclude in all this? I like XInclude.
I'll address Ben's two comments in reverse order.
XInclude is cool, but after trying to make XInclude work with content
documents (such as the Basic Content Document 1.0 spec, see
http://openreader.org/spec/bcd10.html ), I decided to defer XInclude
support for a later time, including for the Binder.
The biggest problem with XInclude for content documents was how to
handle external links into OpenReader Publications when the target is
an XInclude fragment (especially some spot *in* an XInclude fragment)
that may be used multiple times in various places in the publication.
It's doable, but got a little too messy for something which may never
be used that much in OpenReader Publications. Plus, it added another
level of complexity to user agent developers who already have a lot on
their plate to implement OR 1.0 as it now stands. So I decided to
punt, but that doesn't mean we can't implement it in the future. We
will get the ball back, using the football metaphor. <smile/>
(Btw, note the mention of XInclude in Section 7 of the Basic Content
Document spec: http://openreader.org/spec/bcd10.html#sec7 )
Regarding the pointer issue, the current draft Binder says that for
purposes of Navigation Sets, pointing *must* be done to elements, not
to text within elements (a future version of the Binder could expand
support for text-based pointing, although for the purpose of publisher
defined navigation, besides indexing which I have ideas on how to
enable, I'm not sure we'll ever see a need for that.) So from the
XPointer context, we are talking either about fragment identifiers
("#elemid") or the elem() scheme. Also note the following from Section
4.17.3.2:
"(The overarching framework, which references the Binder Document
specification, defines the pointer reference scheme.)"
http://openreader.org/spec/bnd10.html#sec4.17.3.2
So it will be in the OpenReader Publication Framework 1.0 Spec (still
to be written) which will define the allowed pointer scheme. I'm
thinking for the time being to require user agents to recognize
fragment identifiers (which is what all browser codebases do anyway)
but nothing more. At a future time we could add required elem()
support. One reason I am reluctant to support even elem() for OR 1.0
has to do with maintaining the permanence of external links into
OpenReader Publications. The 2003 discussion in PSWG about linking into
OpenReader Publications opened up a lot of eyes as to the problems and
pitfalls of linking into Publications which themselves may morph over
time, plus the fact that many publishers themselves will not follow
proper markup and identifier assignment practices to maintain the
integrity of existing links. Given these challenges, we have to do the
best we can.
So, again, I think it better to start restricted, then open up as we
gain experience and confidence. I don't believe restricting pointing
to fragment identifiers will impede the embracement of the OpenReader
format standard. We just stress that publication authors should add id's
to at least all block level elements and important inline level
elements. Authoring tools can and should do this automatically.
(Now, I would not restrict user agents from supporting elem(), but it
would not be required for the time being. We'll see where things go.)
> Those are my initial thoughts. Hope you find them useful.
Definitely! And I hope others will comment on Ben's points (and my
replies), as well as dig through the spec and come up with other
issues.
Jon
Note that <element/> and <element></element> are equivalent
representations of empty element (i.e. element that has no content
and that may or may not be declared in DTD as EMPTY).
<br></br> and <script type="application/ecmascript" src="script.js"/>
are both allowed according to XML 1.0 recomendation. In overall it is
better to remove comments like this that are not relevant to Binding
specification.
Quote from http://openreader.org/spec/bnd10.html#sec3.3.5
"Some elements in a DTD may be declared EMPTY. When used in an XML
document, these elements must not
contain any content and must use the empty-element syntax (also known
as 'minimized form') as specified in XML 1.0."
At 06:10 PM 5/16/2006, Jon Noring wrote:
>I encourage everyone to read through the Binder spec and note any
>issues you have with it, and post them here to openreader-format for
>discussion.
Jon,
A few comments:
Throughout the specification, you use short-forms for
attribute names...for example, "resid" instead of "resource_id" or
"ResourceID." I don't see this as an authoring simplification, since
most authoring will either use XML editors that will insert the
attributes automatically, or will be electronically converted from
other sources. The Binder documents will be complex enough; why muddy
the waters will unreadable names? Why not use full names for all
markup, and make the Binders more human-readable?
In section 4.3.3., why use URNs and URIs, which include both
URLs and URNs? I don't understand the need for the limitation.
Section 4.4 could use some simplifying and expansion. To old
hypertext geeks like myself, a resource is the best way to describe
something. But you really ought to provide a definition and examples
of what is a resource, for the non-hypertext people.
Section 4.6.2. could do with an extensibility mechanism.
Surely, there will be other forms of usage than OEB and Web -- print,
for example. Couldn't you define an "other," and then a sub-element
within the UserSet to define what that other is, with fallbacks to
OEB or Web? Yes, this adds a level of complexity, but if you state
the user agent is only required to use the fallback, it would be
forward-thinking.
In sections 4.9 through 4.15, I think the use of
'residrefs="intro chap1 chap2"' is clumsy. Processing outside the XML
paradigm to look for spaces in an attribute value is problematic. Why
not break it down into separate elements?
In section 4.12, the idea of "element substitution" is
confusing, especially for people who are coming at this from an
electronic publishing, and not an XML, perspective. Why not call it
what it is, which is "content substitution?" It'd be much clearer, I think.
In section 4.15, the idea of "thumbnails" is fine, but
couldn't it be expanded? For example, couldn't I provide a snippet of
music, instead of the whole MP3? A thumbnail is really a sample of a
given resource; why not define it as such?
In section 4.1.6, why limit the stylesheet selection to CSS?
Why not XSLT and XSL-FOs?
In section 4.17, you discuss the use of pointers? Yay for
pointers! But why not use XPointer for the #pointer? Includes simple
Web-style mechanisms as well as advanced pointing. Also, where is
XInclude in all this? I like XInclude.
Those are my initial thoughts. Hope you find them useful.
--->Ben
Everyone,
Some of you may have already seen the announcement on the TeleRead
blog, but if not, the first working draft of the OpenReader Binder
specification is finished. The spec document is found in its permanent
home at:
http://openreader.org/spec/bnd10.html
For the TeleRead blog article, which contains more background
information on the Binder, see:
http://www.teleread.org/blog/?p=4860
The Binder is the core, the "heart and soul" if you will, of the
OpenReader Publication Format. The Binder is similar to the OEBPS
Package in that it organizes all the resources that make up a
publication, and enables useful and powerful features for ebook
reading applications (user agents) -- features which benefit
publishers, end-users, and others in the digital publication
"ecosystem."
With the first working draft Binder spec now released, there is now
something there "there" which we can discuss. (It also absolves others
of having to write a portion of the spec! -- I will continue to be
the spec editor and do all the dirty work -- you get to have all the
fun.)
I encourage everyone to read through the Binder spec and note any
issues you have with it, and post them here to openreader-format for
discussion.
Btw, I am thinking of setting up a one-time teleconference
presentation (probably via Skype) where I will go over the Binder
spec in detail, and address questions. Let me know if you would be
interested in attending this teleconference. I'm hoping with enough
people attending, we can set up bi-weekly conference calls so we can
coordinate the completion of the format specification (and the
associated "sub-specs" -- the OpenReader Format spec will actually
comprise a set of specs, of which the Binder is the most critical.)
Looking forward to your feedback, and thanks for the feedback already
received from several of you! The contributors list in the spec (see
the title page) will certainly grow -- I'd like to see *your* name in
there.
I'll be at Book Expo this week in Washington, DC. If you plan to
attend, look for me. I'll be hanging around the OSoft/Rosetta
Solutions booth (Booth #3729). You can reach me by cell phone at
801-230-8881. Looking forward to seeing some of you at Book Expo!
Jon Noring
[Something similar to this was posted to The eBook Community, or TeBC.
I'm trying to separate the discussion on the two groups should
discussion continue.]
Bill wrote:
> http://www.teleread.org/blog/?p=4509, entited "What's the Point of
> OpenReader?", makes some points about goals which perhaps should be
> included for the OpenReader format, such as a purely textual
> single-file format.
OpenReader could certainly support, at a future time, other
vocabularies, and this may include regularized text (such as ZML which
Bowerbird has developed) where it is a pretty simple thing for an
existing OpenReader user agent to build a module to translate the
regularized text document into XHTML or other XML vocabulary before
shoving it into the XML-based rendering engine.
(If any doc format can be translated into a clean, structurally-based
XML document -- to differentiate it from presentationally-based XML
such as gawd-awful Word HTML dumps -- that format could certainly be
considered for OpenReader. Regularized text is one such candidate
document format to add support for, and an intriguing one at that. An
alternative is to write a regularized plain text to OpenReader format
converter, where the document itself is converted into an XML document
*before* inclusion in the OpenReader Publication. This brings up one
type of OpenReader authoring system -- a plain text editor with an
automated post-processing step.)
> (Oddly enough, these both appear on David Rothman's blog, rather than
> appearing on the openreader-format mailing list ostensibly set up to
> discuss this issue. You'd think that David, being an OR principal,
> would encourage Roger to either post to this mailing list in the first
> place, or to at least post pointers to this mailing list. Perhaps in
> the future... :-)
Well, David's blog has a much larger readership than does this
openreader-format! (And probably more than The eBook communty, too.)
Also, Roger just recently joined openreader-format, so I hope he plans
to post more there.
But, yes, I hope OpenReader format stuff will be posted to
openreader-format. Since David's role in OpenReader is to assist with
the inevitable "external relations" (i.e., "schmoozing" or "politicking"),
his focus is not on openreader-format, which is mostly technically-
oriented anyway (that's not his forte). openreader-format is my oversight,
and of late I've been burning the midnight oil completing the first very
preliminary draft of the OpenReader format for submission to various
places for focused attention. Unlike major companies who can throw 20
experts full-time at a problem without blinking an eye, we don't have the
same luxury. We're just a bunch of unpaid mercenaries with families to
feed as our number one priority. <laugh/>
As I noted in a prior message to openreader-format, Adobe submitted a
quite complete draft of the container format document to the Container WG
on the very first day, and this greatly assisted with technical discussion.
That spec certainly has evolved a lot since then, but that's not the point.
The point is that it broke through the discussion barrier as well as the
"where do we start?" problem. So, too, with the draft OpenReader format --
having a real, live draft is something tangible to encourage focused
discussion (and of course to attract the inevitable rotten eggs and
tomatoes. <smile/>) Until something is committed to digital paper, it's
hard to discuss that something -- there's nothing tangible to grab onto --
it continues to be an academic discussion that goes around and around in
circles until someone takes the bull by the horn to break out of the
circle.
Jon Noring
Bill wrote:
> Nick Bogaty, on the Ebook Community mailing list, points out that the
> IDPF has released a container spec, at
> http://idpf.org/doc_library/informationaldocs/ocf10-20060421.pdf.
Well, Bill, you beat me to the punch again! I was going to post here
today on the public release of the preliminary *draft* Container spec.
This release is part of IDPF's beginning to find "religion" in opening
up its working groups' activities. Prior to this year, IDPF working
groups were more closed to the public. IDPF has a ways to go, however,
before it reaches the openness we find in other standards groups,
particularly OASIS. As an aside, my personal view is that IDPF should
simply move all its spec activities over to OASIS since that's where
the action is taking place in the whole digital publication arena. IDPF
(and I suppose OpenReader as it now stands), is out-of-synch.
(Btw, we at OpenReader are investigating moving our work over to OASIS,
and may be close to getting the required institutional support. Anyone
here in openreader-format able to help get your institution to back
our application to OASIS? BillJ, does your org have a membership in
OASIS?)
Let me note again that the spec released by IDPF is not the final
specification, but a fairly stable early draft of the IDPF container
spec. It is being released in public in order to get public feedback,
which is good. So I urge everyone here to study the draft spec, and if
you see anything amiss, or have ideas to improve it, the Container
Working Group welcomes your input. I do urge you to post your feedback
here as well as sending it to IDPF to assure it is seen by others.
Btw, both Lee Passey and yours truly are contributors to the Container
Working Group as invited experts. Lee, especially, has contributed a
whole lot to the effort. Interestingly, though, if you subtract Lee and
myself, and the two companies (Adobe and ETI) who petitioned the IDPF
Board to charter the working group, there's little left. Sure, a bunch
of publishers and a few others have gotten involved and contributed
requirements and even got involved in some tech-talk, but comparing to
the input from Adobe, ETI and Lee, all other input has been relatively
minor. (Adobe and ETI are also doing all the spec document editing.)
It is interesting to note that almost immediately after the Container
WG was chartered, Adobe submitted a first "draft" for the container
based on the ODF Container, which is essentially what the current spec
embraces. It appears about three people at Adobe contributed substantial
time to research and author this draft, meaning that possibly over a
hundred man-hours were spent by Adobe *before* the working group was
chartered. Bill McCoy, in his blog, noted the speed of the ContainerWG,
but he failed to mention that considerable resources were already spent
by Adobe and ETI *before* the WG was chartered. so when the
ContainerWG started, we did not start at square one, but started with
something pretty tangible. Plus, the complexity of the ContainerWG spec
is nowhere near as complex as the publication frameworks such as OEBPS
and OpenReader. I'd say the container is only 10% as complex. So with
these factors, two months is definitely possible (I think it's a little
more than that.) It's nothing to crow about. Any spec can get out in
two months if one throws enough money at it.
For the record...
Jon Noring
(p.s., the current plans are that the OpenReader container will be
compatible with the OEBPS Container, but certainly not conformant
since the OEBPS Container requires a conforming OEBPS Publication
inside, which OpenReader will not be. However, compatibility is
important so user agent developers don't have to work hard to support
both container formats. So, BillJ and the others. I request you look
over the draft IDPF Container spec.)
Some interesting comments by Roger Sperberg on the OpenReader format:
http://www.teleread.org/blog/?p=4473 asks whether OpenReader should
"consider OpenDocument's goals and principles", and includes a long
discussion by various interested parties. It also includes a pointer
to Sophie, "an open-source platform for creating and reading
electronic books for the networked environment"
(http://www.futureofthebook.org/blog/archives/2006/03/sophie_is_coming_1.html)
Hmmm. Isn't that Nvu (http://www.nvu.com/)?
http://www.teleread.org/blog/?p=4509, entited "What's the Point of
OpenReader?", makes some points about goals which perhaps should be
included for the OpenReader format, such as a purely textual
single-file format.
(Oddly enough, these both appear on David Rothman's blog, rather than
appearing on the openreader-format mailing list ostensibly set up to
discuss this issue. You'd think that David, being an OR principal,
would encourage Roger to either post to this mailing list in the first
place, or to at least post pointers to this mailing list. Perhaps in
the future... :-)
Bill
Bill wrote:
> Since I hadn't seen this come across the list yet, I thought I'd take
> the liberty of forwarding it. I still can't find any links on the
> OpenReader web site about these documents.
Thanks for forwarding this information! I've been remiss lately in not
keeping openreader-format abreast of new developments.
Yes, I do plan to update the openreader.org site to include links to
the latest documents and information (appended below.)
I encourage, and look forward to, feedback on the current spec
modules. I am working hard on finishing the draft Binder spec, which
is the real "meat" of the specification (what's there now is the DTD
and a complex example). I hope to have the draft complete in a couple
weeks.
Thanks.
Jon Noring
*****
Here's the links to the existing documents:
Basic Content Document 1.0 draft (essentially finished, a newer draft
than the one Bill posted above):
http://www.openreader.org/orp-development/spec/bcd10-draft2006-04-01.html
Basic Content Document 1.0 draft DTD:
http://www.openreader.org/orp-development/dtd/bcd10.dtd
Binder Document 1.0 DTD (draft):
http://www.openreader.org/orp-development/dtd/bnd10.dtd
Binder Document 1.0 (example of a complex OpenReader publication):
http://www.openreader.org/orp-development/workfiles/binderexample-2006-04-03.xml
By and large, the architectural core of the proposed OpenReader 1.0 is
now done. Anyone familiar with OEBPS 1.2 will recognize the Binder
example (and DTD) as being analogous to the OEBPS Package Document (the
name has been changed from Package to Binder for reasons I won't discuss
here.)
There's now enough "meat" in the above information for OEBPS experts
to evaluate and understand the OpenReader 1.0 spec -- even for those
who will develop OpenReader applications to move ahead full steam. But
for those who are not very familiar with OEBPS, they will need a little
more documentation.
Since I hadn't seen this come across the list yet, I thought I'd take
the liberty of forwarding it. I still can't find any links on the
OpenReader web site about these documents.
Bill
------- Forwarded Message
From: Jon Noring <jon@...>
Date: Sat, 15 Apr 2006 17:18:56 PDT
Subject: Re: [ebook-community] Another tragedy of the commons?
[... much removed ...]
Yes, I've been negligent in posting to the 'openreader-format'
YahooGroup regarding the latest status of the specification (which
actually comprises module specifications). Currently, we have the
following finished (so this is the latest update, more recent than
what you mentioned, Bill):
Basic Content Document 1.0 draft (essentially finished, a newer draft
than the one Bill posted above):
http://www.openreader.org/orp-development/spec/bcd10-draft2006-04-01.html
Basic Content Document 1.0 draft DTD:
http://www.openreader.org/orp-development/dtd/bcd10.dtd
Binder Document 1.0 DTD (draft):
http://www.openreader.org/orp-development/dtd/bnd10.dtd
Binder Document 1.0 (example of a complex OpenReader publication):
http://www.openreader.org/orp-development/workfiles/binderexample-2006-04-03.xml
By and large, the architectural core of the proposed OpenReader 1.0 is
now done. Anyone familiar with OEBPS 1.2 will recognize the Binder
example (and DTD) as being analogous to the OEBPS Package Document (the
name has been changed from Package to Binder for reasons I won't discuss
here.)
There's now enough "meat" in the above information for OEBPS experts
to evaluate and understand the OpenReader 1.0 spec -- even for those
who will develop OpenReader applications to move ahead full steam. But
for those who are not very familiar with OEBPS, they will need a little
more documentation.
So, I'm now working on the Binder Document 1.0 Specification draft --
about half done with that -- it is going slow, but I plan to pick
up speed the next few days and get something out within a week or so
(I had planned to get it done 2 weeks ago, but all kinds of problems
and other things came up that had to be dealt with. The tyranny of
the urgent.)
What's left to write are the other modular specs, the most important
being the "OpenReader Framework 1.0 Specification" the "mother ship"
document which ties everything together (essentially referencing the
above mentioned specs.) CSS will essentially be CSS 2.1 -- OEBPS's
support for CSS was more limited and was laboriously detailed -- no
need to do that subsetting for OpenReader. With respect to size and
complexity, the "mother" framework specification should be fairly
compact.
[... more removed ...]
------- End of Forwarded Message
[cc: Garth Conboy]
In the past here on openreader-format we spent some time discussing
the encapsulation of the OpenReader Framework Specification. We
explored multi-part MIME, ZIP, tar, gzip, XML-based, and other
approaches. Late last year, IDPF started a working group to develop an
OEBPS Container format. Because of the similarities between OpenReader
and OEBPS, we deferred further discussion on the encapsulator to see
what will result from the IDPF Container effort.
Both Lee Passey and I have been fairly active in the Container Working
Group technical discussion, including attending (remotely) the
Face-To-Face meeting recently held in NYC.
Today IDPF just released the requirements document for the "Unified
OeBPS Container Format Version 1.0." Mind you, this is not the spec --
it simply is the proposed requirements list upon which the final spec
will conform to. I urge you to go over it and if you feel the
requirements are deficient or insufficient, let IDPF know (refer to
the URL below.)
The document is referenced at:
http://www.idpf.org/doc_library/informationaldocs.htm
(I'm chagrined there is no HTML version, but then the keepers of the
Container WG documents are working in Word, so I guess PDF dumps are
easier for them than markup, although the Requirements document is so
simple it would have taken only a moment to produce an HTML version.
I believe all *normative* specification-related documents from IDPF
should be XHTML anyway, but that's another issue.)
I'd like to say more -- since the Container Working Group has made
substantial progress in terms of what the Container format will
probably end up being.
However, I'm not certain as to IDPF's policy regarding the release of
the technical decisions that have been made so far. So, I'll leave
that for Garth Conboy to clarify.
In my opinion, the entire working group activity, including the
discussion list, the minutes to every meeting, and intermediate
documents of the Container solution, should be completely open to
the public. I don't believe they are yet. Garth? (Unfortunately, the
discussion list does not even appear to be auto-archived, at least in
a publicly-accessible way. The only archives are those which someone
keeps when receiving each article by email.)
Anyway, despite these small criticisms, I applaud IDPF and the
Container WG for releasing the requirements document for public review.
It gives the public time to submit more requirements before the final
spec is etched in concrete.
Jon Noring
> Tackling first the "use XML to encapsulate the resources" comment:
>
> There are two ways to think of using a "master" XML document to
> contain a publication:
>
> 1) To encapsulate or contain a set of file resources (where the
> file resources comprise the Publication in some defined
> framework such as OpenReader and OEBPS.) It would be the XML
> equivalent of multi-part MIME. (This appears to be csssite's
> proposal.)
It is not proposal, but existing completely standard approach that
people already use. There is nothing new at all. See for example
http://my.opera.com/community/dev/operashow/generator.html
It outputs single XHTML document with images, style sheets and
javascript being embedded inside.
> And look at it this way: All Prince does (via PDF) is to "splash"
> glyphs onto a defined "page." This page could just as well be a
> screen rather than a Postscript file.
Yes it could be. But is not. Otherwise it would be the end of all
problems as Prince has good, entirely standards based, publishing
oriented rendering engine. Once you have good engine 'bells and
whistlers' can added at any time.
/* But one mary Prince with some relatively light PDF implementation
like Foxit reader and get some kind of standrds based reader
application in this way */
> From my conversations with Mark
> Carey at OSoft, the next-gen ThoutReader will be totally modular, so
> they can easily plug-in different rendering engines as needs arise
(I
> think they plan to use the Mozilla rendering engine wherever they
> can which will improve the typograpy and CSS support over the Java-
> based one they currently use.)
Gecko is not really keen when it comes to paged media. This is area
where Prince beats Gecko, Presto and KHTML that otherwise are good
candidates for eBook reader.
csssite wrote:
> One point that I don't fully understand is why enveloping standard
> that basically unifies existing technologies XML, XHTML, CSS, SVG etc
> is necessary. Is not XML + CSS framework sufficient for eBooks? If
> multiple XML/CSS/PNG files is not what you need, instead of adding
> OpenReader wrapper you can just merge XML documents, embedd style
> sheets and SVG inside and embedd PNG/JPG images as data URIs in
> document. Looks like good eBook format for me. Maybe I miss
> something.
Good question.
Tackling first the "use XML to encapsulate the resources" comment:
There are two ways to think of using a "master" XML document to
contain a publication:
1) To encapsulate or contain a set of file resources (where the
file resources comprise the Publication in some defined
framework such as OpenReader and OEBPS.) It would be the XML
equivalent of multi-part MIME. (This appears to be csssite's
proposal.)
2) To form the Publication framework itself.
(The differences between these two can get fuzzy, but hopefully it
makes sense. (1) is more of a generic wrapper which doesn't care
what's inside -- that is, what's inside exist independent of the XML
wrapper, while in (2) the outer XML document is integral with the
Publication framework resources -- many of these resources do not
exist independent of the wrapper. For example, the DTD/Schema of this
outer XML document includes all the tags to markup the publication
content itself.)
For (1), the IDPF Container WG has been studying various encapsulation
technologies and are definitely NOT settling upon XML encapsulation of
OEBPS Publication (nor settling on multi-part MIME). So there does not
seem to be any significant advantages of (1), and I do see a few
disadvantages which I won't delve into. (Hint: refer to what both the
Open Document Format people at OASIS and the Microsoft XMP folk
chose to use for encapsulation XML frameworks: ZIP, not XML.)
For (2), there are already ebook implementations taking this approach.
Both Sony's BBeB Xylog and the Russian FictionBook2 (used in the Nokia
web pad device, an ebook reader Roger Sperberg writes a lot about)
employ this approach.
There are several downsides to (2). It makes it harder to build
publications in a modular fashion. With the OEBPS paradigm (which
OpenReader embraces), the Publication is built by collecting a bunch
of independent resources which can be reused in different combinations
(e.g., refer to the OASIS DITA project.) If everything is embedded at
a fundamental level into a single XML document then the ease of
modularity is reduced. To edit a particular part of the content
requires the whole XML document to be edited; if the resource is a
mimencoded image or multimedia, to change or edit it requires it be
removed from the XML, mimedecoded, edited, then reencoded and
reinserted into the XML instance.
In addition, there *might* be a patent encumbrance of the "all-in-one
XML document" publication framework. I remember Microsoft threatening
patent infringement a few years ago on the Open Office Document format
which was taking this approach -- as a result, I believe ODF moved to
a multi-file approach using ZIP as the container of the resources. I
think this will turn out to be a positive move for ODF. The
"all-in-one" XML approach is quite limiting (and witness Microsoft
itself moving away from that in its new XMP spec.) (If my historical
perspective of the Microsoft threat is incorrect, let us know.)
> Another issue is implementation. Apparantly current browsers can't
> print XML + CSS decently, Prince formatter is doing good job here and
> in addition to Prince some kind of publishing oriented reader/browser
> would be really useful. In this respect OSoft's decision to supply
> standards based reader application sounds interesting.
Prince is a great product! And Prince is a proud supporter of
OpenReader's vision. CSS is more powerful for print formatting than
many people give it credit for, as Prince amply demonstrates.
And look at it this way: All Prince does (via PDF) is to "splash"
glyphs onto a defined "page." This page could just as well be a
screen rather than a Postscript file. As Prince continues to develop
its PDF capabilities, it is also building one hell of a rendering
codebase for OpenReader.
Of course, Prince does not have the auxiliary "bells and whistles" an
ebook reading system requires (there's more than just splashing glyphs
onto a screen), like what OSoft's ThoutReader has. Marry the two
together, and one will have a state-of-the-art ebook reading system
with killer typography far exceeding that of web browsers, plus the
functionality/features of ThoutReader. From my conversations with Mark
Carey at OSoft, the next-gen ThoutReader will be totally modular, so
they can easily plug-in different rendering engines as needs arise (I
think they plan to use the Mozilla rendering engine wherever they
can which will improve the typograpy and CSS support over the Java-
based one they currently use.)
Remember, OpenReader wants a "thousand flowers to bloom" (quoting or
misquoting Mao). Even though OSoft is the first company to commit to
rendering OpenReader Publications (and will do so using an *open
source* product!), we invite other companies to build and market their
own OpenReader user agents. That is, quoting Samuel Gompers: "We
Want More."
Interestingly, Mark at OSoft *wants* such competition, because this
will benefit his company by more quickly establishing the OpenReader
standard in the marketplace. He's *open sourcing* their ThoutReader to
enable such competition. It sounds weird, but recent history shows
that there are viable business models for companies to open source
their core products. Of course, I don't expect all companies to
open source their products -- if Adobe were to build an OpenReader
or OEBPS-capable reading system, I doubt they will open source its
codebase.
Jon Noring
One point that I don't fully understand is why enveloping standard
that basically unifies existing technologies XML, XHTML, CSS, SVG etc
is necessary. Is not XML + CSS framework sufficient for eBooks? If
multiple XML/CSS/PNG files is not what you need, instead of adding
OpenReader wrapper you can just merge XML documents, embedd style
sheets and SVG inside and embedd PNG/JPG images as data URIs in
document. Looks like good eBook format for me. Maybe I miss
something.
Another issue is implementation. Apparantly current browsers can't
print XML + CSS decently, Prince formatter is doing good job here and
in addition to Prince some kind of publishing oriented reader/browser
would be really useful. In this respect OSoft's decision to supply
standards based reader application sounds interesting.
--- Jon Noring wrote:
>
> What
> should OpenReader's standards development strategy be if different
> from what it is currently?
>
I've communicated my opinions to Jon in various private email
messages, but I want to say in this forum that individual initiative
and publicly declared rules of engagement are both necessary for a
standard (lower-case "s") to emerge.
I suggest moving the whole OpenReader discussion and development to
OASIS, oasis-open.org, where there are rules, procedures and open
access to all the discussion and decisions.
When the Open eBook Forum was writing the first OEB publication
structure spec, the working group members leaned on existing recs,
specs and standards to base their spec on. Whatever the OpenReader
effort creates, whether it is widely implemented or widely ignored,
can only be used by subsequent efforts -- even an OEB PS 2 -- if there
is some formal body that issues it. A building block has to be stable
for people to build on it, and any spec controlled by an individual or
a company will always appear to be subject to change at a whim and not
from consensus.
Let's get this onto an official track, I say.
Without funding or full-time staffing, the OpenReader "consortium"
cannot play this role and so it's time to move to the next alternative.
Agree? Disagree?
-- Roger Sperberg
firstinitial lastname yahoo
Everyone,
Bill McCoy at Adobe and I have exchanged blog articles regarding
OpenReader's current "standards development strategy". For those
who've not heard about OpenReader, a proposed universal, ebook and
digital publication format open standard, refer to OpenReader's home
site: http://www.openreader.org/ .
Rather than explain our differing views here, refer to the articles:
Bill's latest:
http://blogs.adobe.com/billmccoy/2006/01/the_forked_road.html
Which is a reply to mine (fairly long):
http://www.teleread.org/blog/?p=4201
Those here interested in ebook formats, open standards, and all of
that, I urge you to read the above two articles and then share your
thoughts here for discussion. Does Bill have a strong case? What
should OpenReader's standards development strategy be if different
from what it is currently?
Thanks!
Jon Noring
OpenReader Consortium
Everyone,
Appended below is the first sample "Binder" document. It represents a
quite complex example publication to illustrate the various supported
constructs (from this a DTD/Schema could be built.) Of course, simpler
publications will lead to a much simpler looking Binder -- in this
example I've pulled out most of the stops.
The purpose of the Binder document, which is the core (the essence if
you will) of the OpenReader format, is to "bind" together a set of
supported resources (i.e., content documents, style sheets and images)
into create a coherent Publication, allowing for powerful user agent
functionality and features, plus other benefits.
For those familiar with the OEBPS "Package" document (which serves a
similar purpose), the OR Binder will look vaguely familiar, but has
more stuff and is organized a little differently. For those wanting to
upgrade OEBPS Publications, the OEBPS "Package" appears (almost) fully
transformable into an OR Binder, although some stuff will need to be
added after the fact (such as a required "table of contents" in the
<navigation> section which is not required in OEBPS -- further info on
this below.)
The "more stuff in the OR Binder" includes many improvements which
publishers, user agent developers, accessibility advocates, and
others for several years have recommended be added to OEBPS. For
example, I believe the important requirement of "internationalization"
is now mostly solved -- what remains to do should be easy to add in
the future by the general design of the Binder. I also believe that
inter-publication linking is solved as well, when also considering the
"orp:" IRI scheme, which in a separate message I'll soon post about
(thanks to Lee, Peter, and others, who provided valuable feedback to
help me resolve the knotty issues of identifier namespaces.)
Here are a few of the highlights of the OR Binder (primarily compared
with the OEBPS "Package"):
1) An OR Publication can be defined in two flavors (modes):
a) "oeb" -- follows the OEBPS paradigm (spine, out-of-spine, etc.)
b) "web" -- essentially follows the "web" paradigm (i.e., home
page with a bunch of interlinked pages. This allows
representation of conforming "web sites", a type of
digital publication.)
2) All resources are given a "resource id" ("resid"). The 'resid' is
to be used for all linking purposes (both internal and external) in
the Publication using the "orp:" IRI scheme. The path/name of the
resource is NOT to be used for any other purpose -- a major change
from OEBPS and web practice. Nevertheless, there are a few strong
reasons for this paradigm shift, which make it both necessary and
more powerful.
3) Even though there is still a "main" spine, there can now be
alternate joining together of documents for "other" spines. For the
"oeb" mode, these "other spines" simply become "out-of-main"
content (similar to the OEBPS "out-of-spine".)
There is also an optional <linear> spine, where all the content
documents (including out-of-main) can be listed in some order for
printing purposes. Without this, "out-of-main" content documents
are sort of "lost", and the user agent has to guess where to place
them when linearizing the publication for printing and similar
purposes. User agents, for ordinary electronic presentation, must
not use <linear>.
4) Element substitution. It is now possible to assign image resources
to particular elements within content documents, such as to tables,
lists and other complex markup structures. This is very useful for
certain limited horsepower hardware, and now gives the mechanism by
which MathML and SVG can be cleanly incorporated into OpenReader
and allow "fallbacks" to image resources for user agents which are
not yet SVG and MathML capable. (It is difficult to cleanly add SVG
and MathML markup into OEBPS -- the <object> tag is inappropriate
for this purpose according to the MathML/SVG communities, thus the
need for element substitution from the Binder. Of course, we need
to become XHTML-agnostic, thus relying on <object> is problematic,
the reason why it is removed in the Basic Content Document 1.0.)
5) Multiple primary (and other) user languages may be supported, along
with an independent set of certain Binder constructs ("usersets")
for each supported language.
6) A centralized way to add long descriptions (for each userset) to
any publication resource. Especially useful for accessibility of
images (equivalent to a "super longdesc" for <img> in XHTML.)
7) All CSS styling is now moved to the Binder. I believe the mechanism
for applying style sheets is innovative and powerful. It now allows
a clean way to apply style sheets to content documents of other
non-XHTML vocabularies. CSS will no longer be recognized in any
content documents. All CSS styling must be moved to external CSS
style sheets which are applied from the Binder, and not from
documents. There are now tools to assist in migrating CSS-laden
legacy (X)HTML (especially documents using the horrid 'style'
attribute and the various legacy presentational elements) into
external style sheets.
(This is probably the biggest "revolution" compared to OEBPS.)
(This feature is incomplete. I need to figure out the best way to
support multiple style sheet sets for specific purposes, such as
for particular media groups/types (see sec. 7.3.1 in CSS 2.1) and
particular user agents. Your suggestions and insights are invited.)
8) A required machine-readable "Table of Contents" (the primary
navigational structure in <navigation>.) Also included is similar
support for secondary/other lists and tours (like the OEBPS Tours).
(This is the next biggest "revolution" compared to OEBPS.)
This is an important requirement of the accessibility community,
but is also highly asked for by publishers, user-agent developers
and end-users. This allows user agents to present "table of
contents" (and other similar lists of publication links such as
"lists of illustrations") external to publications. That is,
navigation is no longer a part of content included by publishers
(and done in no uniform manner), but outside of it, in a uniform
and accessible system, where it really belongs. Omission of this
was, in my opinion, the biggest mistake in OEBPS when first
formulated in 1999 (understandable why at the time, but unfortunate
nevertheless.)
The system chosen follows the suggestions of older and newer
Digital Talking Book Specifications. The agony I had was whether
hierarchical document structure was to be represented in the
primary navigation as a nested set of XML elements (as is done in
the latest DTBook), or in a "flat" manner with 'level' attribute to
describe the associated document hierarchical level (as is done in
older DTBook specs.) After consulting with several in the DTBook
community, and weighing the pros and cons (and understanding why
they did what they did when they did), I chose the "flat" approach.
I'll be happy to share the reasons in another message. If there's
majority desire to move to a nested structure for the primary table
of contents, we'll do so.
*****
Despite the innovations, the current Binder is still incomplete.
Here's a list of features we should consider adding in the future
(some immediately, some can wait a while.) I chose not to add them now
either because they were of lower priority, or we need to figure out
how to properly implement them. The (draft) core base of the Binder is
now laid (subject to final discussion and tweaking), and these other
features can be built on top of this base when we feel the base is
stable enough.
1) Right now only four basic media types are supported with no
allowance for any others. Multimedia support needs to be considered
as soon as possible once the current Binder is reasonably stable.
XLink may be used in content documents to embed multimedia.
2) Right now only one Publication Identifier is allowed in <pubid>.
But there is definitely a need for "alternate" identifiers and
"back" identifiers to allow Publication authors who have to tweak
content a little and/or have to change identifiers, to not "break"
already existing links into the Publication from the outside. The
details have to be worked out.
(Sort of like "cross-referencing" of pubid's.)
3) Right now only Dublin Core metadata is supported, but as is done
in OEBPS, we may want to add support for publisher-supplied
metadata (e.g., OEBPS has the <x-metadata> element), as well as a
reference to external metadata file(s) (e.g. MARC/XML.)
4) Within the spines, we may want to consider allowing other resource
media types besides content documents (currently only one type of
content document is specified but there will be more). (E.g., like
illustration plates in older books which are bound totally separate
from the printed sheets.)
5) Font embedding. Several issues need resolving, but we'd like to
allow publishers to embed fonts (and font fragments) in the
Publication itself.
6) Cover Art support. Cover art may include not only images but other
supported multimedia. It is important to allow a range of cover art
so user agents may pick the appropriate one(s).
7) Extend the <international> section to include info on the glyph/
character sets used in the content documents. This allows User
Agents to tell the end-user if there are glyphs that cannot be
represented on the user's current system. (Or, during the sale of
an OR formatted ebook, the glyph requirements can be presented to
the buyer.)
8) Digital signature support (and maybe checksum support as well?)
9) XInclude capability (requires some Binder enabling.) XInclude
support turned out be tougher than I thought as I considered how to
make it work in an inter-publication and external linking
environment (especially with regards to element id's). This is not
trivial. This should wait awhile until the OpenReader Publication
Framework is released and stable.
10) Allow CSS styling to be applied to XHTML namespaced content in the
Binder. As will be noticed in the example Binder below, various
types of content in the Binder (such as Dublin Core metadata
information) allow the inclusion of a few XHTML namespace inline
elements to aid in rudimentary formatting by the user agent (which
the user agent may ignore.)
11) May add something similar to OEBPS Guides to the <navigation>
section. Thoughts?
12) Expand the <substitute> section to associate resources with other
resources. This is for two reasons:
a) "fallbacks" for unsupported media types to core-types (like
is done in OEBPS.) Since the current draft does not support
allowing other media types, it's not an issue. But as soon as
we discuss multimedia support and support for non-core media
types, this becomes an important issue.
b) We may allow a resource to simply have alternatives. For
example, a publisher may wish to include several alternatives
to a particular image, which the user agent may select from
based on image parameters.
That's all folks!
Enjoy Binder version "0.01" below.
Jon Noring
(First draft sample Binder document, 19 January 2006. The DTD has
not yet been generated.
**********************************************************************
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE binder PUBLIC
"-//OpenReader//DTD Binder Document 1.0//EN"
"http://openreader.org/dtd/binder10.dtd">
<!-- Example Binder Document, 19 January 2006. It is very likely this
example will change in several ways before being finalized for
the Binder 1.0 specification. But, overall, this example of a
quite complex publication includes the current thinking of most
planned features of the Binder. Those who understand the OEBPS
1.x Package Document will see several similarities. -->
<binder mode="oeb"
xmlns="http://openreader.org/namespaces/orp-binder/1.0/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<pubid>
<primary idns="urn:uuid">6a2014b0-87a2-11da-a72b-0800200c9a66</primary>
</pubid>
<resources>
<item resid="intr1"
resource="intro.xml"
media-type="application/x-orp-bcd1+xml"
comment="This is a special introduction written by Jane Doe"/>
<item resid="intr2"
resource="intro.xml"
media-type="application/x-orp-bcd1+xml"/>
<!-- Note above that the same content document resource,
"intro.xml" is given two different resid's: 'intr1' and
'intr2'. This allows one resource to act like two or
more. This feature will probably be only rarely used,
but is nonetheless useful in certain situations. -->
<item resid="chap1"
resource="chapter1.xml"
media-type="application/x-orp-bcd1+xml"/>
<item resid="chap2"
resource="chapter2.xml"
media-type="application/x-orp-bcd1+xml"/>
<item resid="note1"
resource="note1.xml"
media-type="application/x-orp-bcd1+xml"/>
<item resid="note2"
resource="note2.xml"
media-type="application/x-orp-bcd1+xml"/>
<item resid="css-a"
resource="cssdir/a.css"
media-type="text/css"/>
<item resid="css-b"
resource="cssdir/b.css"
media-type="text/css"/>
<item resid="css-c"
resource="cssdir/c.css"
media-type="text/css"/>
<item resid="css-d"
resource="cssdir/d.css"
media-type="text/css"/>
<item resid="imag1"
resource="images/image1.png"
media-type="image/png"/>
<item resid="imag2"
resource="images/image2.jpg"
media-type="image/jpeg"/>
<item resid="tabl1"
resource="images/table1.png"
media-type="image/png"/>
</resources>
<spines>
<main residrefs="intr1 chap1 chap2" entry="intr1#h01"/>
<!-- There must be one, and only one, <main> element.
<other> and <linear> are optional. If the 'mode'
attribute in <binder> has the value of "web",
then the concatenated content defined by <main>
is essentially like a web site "home page", thus
the reason calling it <main> rather than <spine>
as done in OEBPS. -->
<other residrefs="intr2 note1 note2" entry="note1#n01"/>
<linear residrefs="intr1 chap1 chap2 note1 note2"/>
</spines>
<substitute>
<elements>
<element ref="chap1#table01" residrefs="tabl1"/>
</elements>
</substitute>
<!-- <elements> points to elements in content documents where the
publisher may provide optional alternative resources for User
Agents to choose from and use in lieu of rendering the element
contents. This is especially useful for complex tables, and
for future expansion to SVG and MathML which some reading
systems may not be able to adequately support, at least in the
short term. At present, 'residrefs' only to images (type PNG
and JPEG) are supported -- others resource types are ignored.
Even if User Agents can render the native content of the
element, they should provide the option for the end-user to
view the alternative image(s).
<international>
<publang primary="en-US"
secondary="de-DE fr-FR"/>
</international>
<!-- <publang is required, as is the 'primary' attribute. The
value of 'primary', which can be one or more language/
country codes (if more than one, separated by white space),
lists the primary language(s), or intended/target audience.
Only rarely will two primary languages be given, an example
being a bilingual dictionary that speakers of both languages
can mutually use.
<userset lang="en-US" xml:lang="en-US">
<!-- There must be at least one <userset>. For Binder 1.0,
<userset> is dependent only upon target language; in future
versions other purposes may be permitted. There should be a
<userset> for each primary language in <publang>. But
publication authors may supply alternative <userset> for
languages other than the primary one(s). 'lang' refers to
the target language of the <userset>, while xml:lang is
the internal language used by the markup in <userset> In
nearly all cases the value in xml:lang should be the same as
in 'lang'. -->
<metadata>
<dublincore>
<dc:title>Title of <xhtml:em>This</xhtml:em> Book</dc:title>
<dc:title>An Optional Subtitle</dc:title>
<!-- at least one dc:title required, all of which must
appear first; the remaining dc:* elements are optional
in Binder 1.0 (in OEBPS, dc:identifier and dc:language
are also required, but in Binder are taken care of
elsewhere.) The following examples appear in
alphabetical order by element name, but may be in any
order: -->
<dc:contributor role="ill" extref="http://www.acmeegraphics.com/"
ref="intr1#p025">Acmee Graphics</dc:contributor>
<dc:coverage>France and Germany, 19th Century</dc:coverage>
<dc:creator role="aut" file-as="Doe, John"
extref="http://www.johndoe-author.com/">John Doe</dc:creator>
<dc:creator role="aui" file-as="Doe, Jane">Jane Doe</dc:creator>
<dc:date event="published">2006-06-17</dc:date>
<dc:description ref="intr1#p005 intr1#p010">This is a
<xhtml:strong>shorter</xhtml:strong> description.</dc:description>
<dc:format>OpenReader Publication 1.0</dc:format>
<dc:identifier
idns="urn:uuid">6a2014b0-87a2-11da-a72b-0800200c9a66</dc:identifier>
<dc:language type="primary">en-US</dc:language>
<dc:language type="secondary">de-DE</dc:language>
<dc:language type="secondary">fr-FR</dc:language>
<dc:publisher extref="http://www.acmeepublishing.com/">Acmee
Publishing</dc:publisher>
<dc:relation
extref="orp:/urn:uuid:b04155f8-54ce-4963-8f13-285839378e7d/">Title of a Related
Book</dc:relation>
<dc:rights>Copyright 2006, Acmee Publishing</dc:rights>
<dc:source>From a book first published in 1888.</dc:source>
<dc:subject scheme="lcc">HQ470.S3</dc:subject>
<dc:type>Text</dc:type>
</dublincore>
</metadata>
<resdescs>
<resdesc residref="chap1">This <xhmtl:em>first chapter</xhtml:em> is
written by John Doe.</resdesc>
<resdesc residref="chap2">This second chapter is written by John
Doe.</resdesc>
<resdesc residref="imag1" ref="chap1#p080 chap1#p090">John Doe is a
wonderful person as seen in this portrait of him.</resdesc>
<resdesc residref="imag2" ref="chap2#p065 chap2#p075"
extref="http://www.janedoe-author.com/">Jane Doe, like John Doe, is a wonderful
person as seen in this portrait of her.</resdesc>
<resdesc residref="tabl1">Image representation of Table 1</resdesc>
</resdescs>
<!-- The optional <resdescs> above is for describing or
amplifying certain publication resources. For enhancing
accessibility, a resource description should be included
for each image resource. On end-user demand, user agents
should present the contents of <resdesc> for images. -->
<styling>
<styleset type="primary">
<header>Acmee Publishing's Default Style</header>
<styleapply cdrefs="intr1 chap1 chap2" cssrefs="css-a css-b"/>
<styleapply cdrefs="intr2 note1 note2" cssrefs="css-c"/>
</styleset>
<styleset type="alternate">
<header>Cool Styling By John Doe Himself</header>
<styleapply cssrefs="css-d"/>
<styleapply cdrefs="intr2" cssrefs="css-c"/>
</styleset>
</styling>
<!-- For <styling>, if cdrefs not given, it is assumed all
content documents in the Publication (i.e., all
resid's for content documents) use the same set of CSS
style sheets given in cssrefs. Order in cssrefs is
significant, applied in cascade fashion. So, too, with
the order of <styleapply>; and a resid can appear in
more than one <styleapply> in a <styleset>. User
agents should allow the end-user to select and apply
alternative stylesets upon demand. -->
<navigation>
<primary>
<header>Table of Contents</header>
<desc>This is the <xhtml:strong>Table of Contents</xhtml:strong> for
this book.</desc>
<pointer level="1" ref="intr1" targetclass="frontmatter section">
<header>Introduction</header>
<desc extref="http://www.janedoe-author.com">The
<xhtml:em>Introduction</xhtml:em> written by Jane Doe.</desc>
</pointer>
<!-- There are no standardized values for the optional
attribute 'targetclass', but the publication author
should use some uniform system. User agents may use
the value of targetclass, but are not required to do
so. -->
<pointer level="1" ref="chap1" targetclass="chapter">
<label>Chapter 1</label>
<header>This is the <xhtml:em>First</xhtml:em> Chapter</header>
<desc>This first chapter is written by John Doe.</desc>
</pointer>
<pointer level="2" ref="chap1#sec1.1" targetclass="section">
<label>1.1</label>
<header>A Section in a Chapter</header>
</pointer>
<pointer level="3" ref="chap1#sec1.1.1" targetclass="sub-section">
<label>1.1.1</label>
<header>A Sub-Section in a Section</header>
</pointer>
<pointer level="1" ref="chap2" targetclass="chapter">
<label>Chapter 2</label>
<header>This is the <xhmtl:em>Second</xhtml:em> Chapter</header>
<desc>This second chapter is written by John Doe.</desc>
</pointer>
</primary>
<!-- There must be one, and only one, <primary> in
<navigation>. The value of 'ordered' must be 'yes', which
is the defaut. Hierarchy (using 'level') is optional but
*should* be used when the document structures being
targeted are themselves hierarchical.
<secondary>, see below, may appear any number of times,
and may be hierarchical or flat, either of type "list"
or "tour". Ordered/unordered needs to be resolved as to
when it can be used and what it means for user agent
navigation. -->
<secondary type="list">
<header>List of Illustrations</header>
<desc>This is the complete list of illustrations in this
book.</desc>
<pointer ref="chap1#image01">
<label>Illustration 1</label>
<header>Portrait of John Doe</header>
<desc ref="chap1#p080 chap1#p090">John Doe is a wonderful person
as seen in this portrait of him.</desc>
</pointer>
<pointer ref="chap2#image02">
<label>Illustration 2</label>
<header>Portrait of Jane Doe</header>
<desc ref="chap2#p065 chap2#p075">Jane Doe, like John Doe, is a
wonderful person as seen in this portrait of her.</desc>
</pointer>
</secondary>
<secondary type="tour" ordered="no">
<header>Cities Visited</header>
<pointer ref="chap1#p020">
<header>Berlin</header>
<desc>Berlin is a <xhtml:em>fun</xhtml:em> place to visit!</desc>
</pointer>
<pointer ref="chap2#p126">
<header>Rome</header>
</pointer>
<pointer ref="chap1#p98">
<header>Paris</header>
</pointer>
</secondary>
</navigation>
</userset>
</binder>
> In essence, it says that IRIs are URIs that 1.
> can use the entire Unicode character set instead of just the ASCII
> character set (a restriction which, if it in fact exists, has been
> almost universally ignored) and 2. can be encoded with utf-8 or utf-16
> without octet encoding into ASCII (I think Mr. Janssen would argue that
> this has always been the case).
And Mr. Passey would, again, be both wrong and right in his
supposition of what I might say, because of his apparent inability to
say what (I suppose) he means. Yes, any RFC 3986 URI string can indeed
"be encoded with utf-8", because it is an ASCII string, and UTF-8 is a
superset of ASCII. I would indeed argue that this has always been the
case. What's more, if you had the bits of a URI encoded with UTF-8,
and the bits of that URI encoded with US-ASCII, you couldn't tell one
from the other! And since the characters in UTF-16 are also a
superset of what's allowed in URIs, it could also be true there,
depending on your comparison technique. The code points wouldn't be
the same, but the characters represented would be.
But I hope that's not really what he *meant* to say, because it would
be a witless observation. I think he meant to say that the IRI spec
allows non-ASCII characters to occur in something much like a URI
string without being percent-escaped. Nothing to do with UTF-8 or
UTF-16. And I certainly would never argue that *that* has always been
the case.
Basically, the IRI spec extends the "unreserved" set of characters
from what's specified for URIs. The "unreserved" characters are those
not required to be escaped or treated specially in a URI. That set
includes only ASCII letters, digits, and a few punctuation marks. In
an IRI, by contrast, every Unicode character that's not in ASCII is
added to the "unreserved" set. (Well, almost every character -- some
"private" character codes are reserved in the upper registers for
query use only.)
But the specific octet encodings used in the representation of an IRI
will also be constrained by the use of the IRI, by the scheme of the
IRI, and by the context in which the IRI appears. Perhaps the most
useful part of the IRI spec is that it specifies a standard way of
encoding or escaping non-ASCII characters in URIs. I think that will
be widely used.
Bill
Lee Passey wrote:
> Jon Noring wrote:
> Boy, Jon, leave it to you to open up yet another can of worms. ;-)
That's my job. <laugh/>
And I want to thank Peter Ring and Chris Lilly for also replying
to this thread. Very useful information. Now on to Lee's reply...
> ... From a practical standpoint utf-8 is
> vastly superior to utf-16 (because all my standard cross-platform
> 'C' string handling routines still work), so let's just agree that
> all non-reserved Unicode characters are acceptable, and that all
> references will be utf-8 encoded, and move on. I'm tired of
> straining at this particular gnat.
I can understand this!
>> The purpose of the ORP IRI Scheme is to properly enable intra- and
>> inter-publication linking (and resource association), into (from
>> the outside world), within and between OpenReader Publications.
> In our particular context, I think the distinction is useful and
> important. If I'm writing the "Annotated Sherlock Holmes", I may
> want to refer to part 2 of the story, "A Scandal in Bohemia." But
> when it comes right down to it, I don't care where the story is
> actually located in cyberspace. I need a URN, not a URL. Now a URL
> will work, so long as the location of the resource doesn't change,
> but the internet is very fluid, and a GUID is forever.
For those wanting to know what a GUID is, here's the Wikipedia entry:
http://en.wikipedia.org/wiki/Guid
Here's the entry for OSF's UUID:
http://en.wikipedia.org/wiki/Universally_Unique_Identifier
> So, if what I need is a URN, and if Info gives me precisely that,
> why not just go with "info:"?
>
> Well, in fact, we _do_ need some location information, so a pure
> identifier is probably not sufficient; not _external_ location
> information, but _internal_ location information. Not only do we
> need to identify a specific publication, we also need to identify a
> specific resource within that publication, and perhaps some specific
> location within the specific resource within the specific
> publication.
Yes, we not only need to know the identity (URN) of the OR Publication,
but to be able to address any spot within the textual content and
related resources. Of course, we also need to be able to address the
OR Publication itself, or provide sufficient information that it may
be machine-located on some space (accessible by the user agent) by
file sniffing. (Thus my interest that the pubid's for all the OR
Publications encapsulated in a "wrapper" be readily accessible.)
> <aside>
> Will we want to include packages inside packages inside
> publications? For example, using the OEBPS as a reference, will we
> want to be able to include something like:
>
> <manifest>
> <item id="book1" href="fellowship.opf"
> media-type="text/x-oeb1-package" />
> <item id="book2" href="twotowers.opf"
> media-type="text/x-oeb1-package" />
> <item id="book3" href="return.opf"
> media-type="text/x-oeb1-package" />
> </manifest>
>
> If so, we will probably want some sort of restriction that slashes and
> colons are not allowed inside identifiers
> </aside>
This is an interesting idea. I had not thought whether we'd want to
consider embedding (nesting) an OR Publication *within* another OR
Publication (independent of the wrapper/container). This is not done
in OEBPS.
What I had considered is that within a wrapper/container/encapsulator
we could have multiple OR Publications, but that they'd be independent
(orthogonal) of each other. That is, any consolidation would be done
at the wrapper level.
So, does anyone see the benefit of letting an OR Publication contain
another OR Publication as a resource component?
I'll be mulling this over. It has a great impact on various fundamental
facets of the OR Framework design, the ORP IRI scheme, and the wrapper
design. Even if we choose for OR 1.0 not to allow this, but may at a
future time, then the fundamental architecture has to at least be OR-
nesting capable.
> Heck, #fragment-id is allowed in XPointer, so why not just say that
> XPointer expressions are allowed? This is a format specification,
> not a software design document. How much of the XPointer
> specification ends up being supported is an implementation issue,
> not a specification issue.
Agreed. But some people here are not knowledgeable of XPointer, so
describing it in XPointer terms rather than the well-known
"fragment-identifier" may have confused them.
>> Comments, criticisms? Am I missing something critical in the
>> syntax for the ORP IRI scheme?
> The biggest lack I see is a discussion of the syntax for the
> publication and resource IDs; this is where the "info:" schema comes
> back into play.
True. And something that should be discussed.
I do believe for OR 1.0 that we pretty much leave choosing the
identifier scheme up to publishers. This does not mean we'd be silent
on suggesting what publishers should do to assure uniqueness, but that
we shouldn't force a particular identifier scheme on them. We could
suggest one or more particular ones for publishers who haven't
decided.
> For this system to work in "deep-linking" there has to be some
> assurance of uniqueness in naming.
Definitely!
> Every time I turn around it seems there is some new scheme for
> unique IDs being promoted as the greatest thing since sliced bread.
> The "info:" URI scheme is not an attempt to create Yet Another
> Unique ID, but as a way to consolidate many of the schemes already
> in play. It doesn't have a broad enough scope to rely on entirely,
> but I quite like what it is attempting to do. So I would suggest we
> adopt "info:" as one of the permissible identification schemes.
> Indeed, I think we ought to require that _all_ identifiers (in the
> "orp:" IRN) have a scheme, or namespace, specifier.
Interesting. I have been thinking that for each 'pubid' and 'resid'
attribute in an OR Publication Binder, we'd also include a "scheme"
attribute, viz.:
<binder>
<pubid scheme="istc">0A9-2002-12B4A105-6</pubid>
...
<!-- Aside, should we assign pubid by attribute or by PCDATA
as shown above? -->
...
<manifest>
<item resid="550E8400-E29B-11D4-A716-446655440000"
scheme="uuid"
href="/docs/chap1.xml"
media-type="application/orp-bcd1+xml"/>
...
</manifest>
...
</binder>
However, I'm not sure whether the 'scheme' attribute should be
required, and if we should specify rules how it should be expressed
(e.g., case; should it follow a list of approved values, etc.?)
> As examples, let's use "The Adventures of Sherlock Holmes," which is
> composed of 12 short stories which are in the public domain. Let's
> also say that I write a copyrighted introduction to the collection,
> for which I buy the DOI "10.1000/123." Then I create a new
> publication with the ISBN number 1-234567-890-X. The IRN for the
> introduction might be:
>
> orp:/urn:isbn:1-234567-890-X/info:doi:10.1000%2F123
O.k., now we run up to an issues we need to address.
Specifically, this deals with linking to within an OR Publication (or
"internal" linking.) I'd much prefer that in the allowed syntax for
internal linking we need not include identifier namespaces, viz.
<a href="12345#para1234">link</a>
is acceptable for the longer version:
<a href="x-other:myidscheme:12345#para1234">link</a>
(12345 is a resource id of unregistered id namespace "myidscheme",
such as referencing "chap1.xml")
The first is simple, "looks" like current web practice and by its
syntax (lack of "pubid" and namespace colon is assumed to be internal
to the publication.
The second is simply needlessly more complicated.
Now, I do see some interesting possibilities if a publisher
reuses some resources in other OR Publications. For example, the
external link:
<a
href="orp:/x-other:mypubid:123456/x-other:guid:3F2504E0-4F89-11D3-9A0C-0305E82C3\
301" />link</a>
would point to the same resource as:
<a
href="orp:/x-other:mypubid:987654/x-other:guid:3F2504E0-4F89-11D3-9A0C-0305E82C3\
301" />link</a>
Different OR Publications, but same internal resource (so long as
in both Binders the same GUID is specified.)
Hmmmm, very interesting for sure.
Maybe namespace prefixes may be left out for internal references,
but would be required for full "orp:" IRIs?
> Note that because the Digital Object Identifier scheme uses slashes
> as part of their naming scheme, whereas we use slashes to indicate
> path components, the slash in the DOI identifier must be percent
> encoded.
Yes, if a pubid and resid include certain characters, they would
have to be appropriately escaped as percent octets or whatever.
> ISBN numbers are registered as part of the URN naming scheme,
> (http://www.iana.org/assignments/urn-namespaces) and DOI identifiers
> are registered as part of the Info naming scheme
>
(http://info-uri.info/registry/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc\
).
> So far I have not found any scheme which includes GUIDs, which is a
> very useful identifier, and it is possible that some publisher, some
> where, may have sufficient distain for standards to want to use a
> totally unique identification scheme. Taking my lead from MIME, I
> propose we create a new naming scheme, "x-other:" which will be
> totally uncontrolled, but which would require a sub-scheme.
There may also be independent authors who wish to produce an OR
Publication. Here, the author can either just "roll their own", use
a tool (online or application) to obtain a unique identifer, or the
authoring tool they use will generate the unique id's for them in
some non-registered namespace such as GUID or UUID.
> The scheme for GUIDs could be "x-other:guid:", the scheme for SKUs
> could be "x-other:sku:", and the scheme for MicroSoft inventory
> numbers could be "x-other:MSInv:" (For GUIDs, they are so useful I
> think we really ought to lobby the IANA to add GUIDs to the URN
> registry). Holme's work is in the public domain, and has no ISBN or
> DOI (which is reserved for works have some intellectual property
> claim) so I may choose to use the Project Gutenberg index number for
> the main work, and an arbitrary GUID for the individual story. Thus,
> to get to part two of "A Scandal in Bohemia" the IRN might be:
>
>
orp:/x-other:gutenberg-id:1661/x-other:guid:48123BC4-99D9-11D1-A6B3-00C04FD91555\
#Part2
Laugh, well this assumes there will be an OR Publication of this PG
work.
> I hope this makes sense.
Yes, it does. A lot of food for thought. Since Peter Ring brought
up info: in the first place, I hope he will reply in this thread.
Also, Chris, Garth, Gary and others are welcome, too. If we get some
sort of consensus, a rough spec can be hammered out. Since I will
do the actual final markup/editing of the specs, this frees up the
others to not worry about the sundry details. Let's get some specese.
Jon
Boy, Jon, leave it to you to open up yet another can of worms. ;-)
I've followed your links, looked at the "info:" scheme, OpenURL, and
even looked at Digital Object Identifiers (now there's an idea that
deserves obscurity). In the end, I think I have to agree with your
proposal with little or no modifications.
Jon Noring wrote:
> Everyone,
>
> Another critical component of the OpenReader specification is to
> define an IRI scheme (where IRI is an internationalized URI per RFC
> 3987, see: http://www.ietf.org/rfc/rfc3987.txt)
This specification is a good example of how it's possible to use lots of
words to say very little. In essence, it says that IRIs are URIs that 1.
can use the entire Unicode character set instead of just the ASCII
character set (a restriction which, if it in fact exists, has been
almost universally ignored) and 2. can be encoded with utf-8 or utf-16
without octet encoding into ASCII (I think Mr. Janssen would argue that
this has always been the case). From a practical standpoint utf-8 is
vastly superior to utf-16 (because all my standard cross-platform 'C'
string handling routines still work), so let's just agree that all
non-reserved Unicode characters are acceptable, and that all references
will be utf-8 encoded, and move on. I'm tired of straining at this
particular gnat.
> The purpose of the ORP IRI Scheme is to properly enable intra- and
> inter-publication linking (and resource association), into (from the
> outside world), within and between OpenReader Publications.
According to RFC 3986,
<blockquote>
A URI can be further classified as a locator, a name, or both. The term
"Uniform Resource Locator" (URL) refers to the subset of URIs that, in
addition to identifying a resource, provide a means of locating the
resource by describing its primary access mechanism (e.g., its network
"location"). The term "Uniform Resource Name" (URN) has been used
historically to refer to both URIs under the "urn" scheme [RFC2141],
which are required to remain globally unique and persistent even when
the resource ceases to exist or becomes unavailable, and to any other
URI with the properties of a name.
</blockquote>
In this paragraph, this RFC is making the distinction between the
identification of something by an arbitrary, but hopefully unique,
identification attribute (a URN), and the identification of something by
a unique location in cyberspace (a URL). (OpenURL expands this to permit
identification of something by non-naming attributes, such as author's
name, publication date, subject, etc.) The RFC goes on to recommend
that, "Future specifications and related documentation should use the
general term "URI" rather than the more restrictive terms "URL" and "URN"."
In our particular context, I think the distinction is useful and
important. If I'm writing the "Annotated Sherlock Holmes", I may want to
refer to part 2 of the story, "A Scandal in Bohemia." But when it comes
right down to it, I don't care where the story is actually located in
cyberspace. I need a URN, not a URL. Now a URL will work, so long as the
location of the resource doesn't change, but the internet is very fluid,
and a GUID is forever.
This problem of creating direct, or "first-class" identifiers was the
motivation for the creation of the "info:" URI scheme: "The info URI
scheme was developed within the library and publishing communities ...
because of the need for URIs as pure identifiers, that is, to identify
(not retrieve, dereference, locate, name, or any of those other things
that URIs do)." While it claims to be a URI scheme, Info would be better
described as a URN scheme, because it makes no guarantees about object
persistance.
So, if what I need is a URN, and if Info gives me precisely that, why
not just go with "info:"?
Well, in fact, we _do_ need some location information, so a pure
identifier is probably not sufficient; not _external_ location
information, but _internal_ location information. Not only do we need to
identify a specific publication, we also need to identify a specific
resource within that publication, and perhaps some specific location
within the specific resource within the specific publication.
<aside>
Will we want to include packages inside packages inside publications?
For example, using the OEBPS as a reference, will we want to be able to
include something like:
<manifest>
<item id="book1" href="fellowship.opf"
media-type="text/x-oeb1-package" />
<item id="book2" href="twotowers.opf" media-type="text/x-oeb1-package" />
<item id="book3" href="return.opf" media-type="text/x-oeb1-package" />
</manifest>
If so, we will probably want some sort of restriction that slashes and
colons are not allowed inside identifiers
</aside>
> The proposed ORP IRI Scheme is actually quite simple:
>
> orp:/pubid/resid[#fragment-id]
>
> { [#fragment-id] is optional }
>
>
> pubid: The unique identifier of the Publication, declared in the
> Binder.
>
> resid: The identifier of the target resource, such as a content
> document. The unique resid is given in the "manifest" in the Binder.
> It is associated with the resource name and optional path (more
> details when the Binder is discussed.)
>
> If the resid is a content document (XML), then an 'id' may be added
> to any element in the document with the given id, and referenced
> using the "fragment-id" (for fragment identifier) syntax. (The
> extension to XPointer schemes is obvious, particularly the useful
> element() scheme which may be supported at a future time.)
Heck, #fragment-id is allowed in XPointer, so why not just say that
XPointer expressions are allowed? This is a format specification, not a
software design document. How much of the XPointer specification ends up
being supported is an implementation issue, not a specification issue.
[Examples snipped]
> Comments, criticisms? Am I missing something critical in the syntax
> for the ORP IRI scheme?
The biggest lack I see is a discussion of the syntax for the publication
and resource IDs; this is where the "info:" schema comes back into play.
For this system to work in "deep-linking" there has to be some assurance
of uniqueness in naming. The "info:" schema seeks to achieve this
uniqueness by allowing for registered namespaces, each of which
presumably specifies its own rules for uniqueness (as well as its own
snytactical rules). For example, partial ISBN numbers are assigned to
publishers, who then are responsible for numbers for their publications;
the numbers are assigned in such a way that an ISBN from one publisher
cannot collide with the ISBN from another publisher (although any
specific publisher can screw up his own publications however he wants).
The same sort of method is used for assigning internet OIDs (see
http://www.alvestrand.no/objectid/) or Intellectual Property Digital
Object Identifiers (http://www.doi.org/).
Every time I turn around it seems there is some new scheme for unique
IDs being promoted as the greatest thing since sliced bread. The "info:"
URI scheme is not an attempt to create Yet Another Unique ID, but as a
way to consolidate many of the schemes already in play. It doesn't have
a broad enough scope to rely on entirely, but I quite like what it is
attempting to do. So I would suggest we adopt "info:" as one of the
permissible identification schemes. Indeed, I think we ought to require
that _all_ identifiers (in the "orp:" IRN) have a scheme, or namespace,
specifier.
As examples, let's use "The Adventures of Sherlock Holmes," which is
composed of 12 short stories which are in the public domain. Let's also
say that I write a copyrighted introduction to the collection, for which
I buy the DOI "10.1000/123." Then I create a new publication with the
ISBN number 1-234567-890-X. The IRN for the introduction might be:
orp:/urn:isbn:1-234567-890-X/info:doi:10.1000%2F123
Note that because the Digital Object Identifier scheme uses slashes as
part of their naming scheme, whereas we use slashes to indicate path
components, the slash in the DOI identifier must be percent encoded.
ISBN numbers are registered as part of the URN naming scheme,
(http://www.iana.org/assignments/urn-namespaces) and DOI identifiers are
registered as part of the Info naming scheme
(http://info-uri.info/registry/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc\
).
So far I have not found any scheme which includes GUIDs, which is a very
useful identifier, and it is possible that some publisher, some where,
may have sufficient distain for standards to want to use a totally
unique identification scheme. Taking my lead from MIME, I propose we
create a new naming scheme, "x-other:" which will be totall
uncontrolled, but which would require a sub-scheme. The scheme for GUIDs
could be "x-other:guid:", the scheme for SKUs could be "x-other:sku:",
and the scheme for MicroSoft inventory numbers could be "x-other:MSInv:"
(For GUIDs, they are so useful I think we really ought to lobby the IANA
to add GUIDs to the URN registry). Holme's work is in the public domain,
and has no ISBN or DOI (which is reserved for works have some
intellectual property claim) so I may choose to use the Project
Gutenberg index number for the main work, and an arbitrary GUID for the
individual story. Thus, to get to part two of "A Scandal in Bohemia" the
IRN might be:
orp:/x-other:gutenberg-id:1661/x-other:guid:48123BC4-99D9-11D1-A6B3-00C04FD91555\
#Part2
I hope this makes sense.
On Tuesday, December 20, 2005, 7:16:38 PM, Jon wrote:
>> If 'IRI' (RFC 3978) rather than 'URI' (RFC 3986) isn't critical,
>> 'info' URIs should fit the bill. A new IRI scheme would be competing
>> in an already crowded space. A somewhat dated (May 1998) survey of
>> electronic publication identifiers:
JN> Internationalization *is* critical, so the scheme *must* support IRI,
JN> in my opinion.
Yes.
JN> Am I right in assuming that extending an existing registered URI
JN> scheme to IRI is not allowed? (I'll have to reread IRI to see what it
JN> says about extending existing URI schemes. Maybe this is allowed.)
No. Its allowed, and common. However, the IRI might get converted to a
URI (hex escaped) when its actually dereferenced.
--
Chris Lilley mailto:chris@...
Chair, W3C SVG Working Group
W3C Graphics Activity Lead
Co-Chair, W3C Hypertext CG
Peter Ring wrote:
> For inspiration, and to get an idea of what it takes to register
> yet another URI or IRI scheme, have a look at the 'info' URI scheme,
> which is designed to serve as a general basis for identifying
> information assets:
I appreciate your feedback, Peter. Definitely when embarking on
designing an IRI scheme, even a private one, we have to understand
what's out there.
(And for the record, I have no qualms with developing a private IRI
scheme if there's not a registered public one out there that meets
our requirements, and to wait to see how stable that scheme is before
attempting to register it.)
And thanks for pointing out the difficulties with URI/IRI and
registration. I sort of expect that registration of *any* URI/IRI
scheme will be difficult and arduous.
It is definitely important in this discussion to begin collecting and
collating requirements for our scheme. Maybe there's an existing one
we can embrace rather than building our own. I'll post my set of
requirements below. Others are free to add their own, or to argue
against some of mine.
> I find the marriage of 'info' (for identification) and OpenURL (for
> resolution) particularly interesting. OpenURL is quickly becoming
> the defacto standard for open referencing in the academic publishing
> world.
Definitely something to explore if it meets requirements, but I am
pessimistic it will, at least for the purposes I proposed the "orp"
IRI scheme. But let's give it a whirl!
> If 'IRI' (RFC 3978) rather than 'URI' (RFC 3986) isn't critical,
> 'info' URIs should fit the bill. A new IRI scheme would be competing
> in an already crowded space. A somewhat dated (May 1998) survey of
> electronic publication identifiers:
Internationalization *is* critical, so the scheme *must* support IRI,
in my opinion.
Am I right in assuming that extending an existing registered URI
scheme to IRI is not allowed? (I'll have to reread IRI to see what it
says about extending existing URI schemes. Maybe this is allowed.)
> Wrt. the use of an identification scheme for components of ebooks,
> AAP has published some guidelines that may serve as starting point
> for discussions:
>
> http://www.publishers.org/digital/numbering.pdf
Oh Boy! Deja vu. I remember this document from discussions in the
OeBF Metadata and Identifiers Working Group back a few years ago.
My first thought after reading it was "what a mess."
My problem with that AAP document is that it treats an ebook as a
"black box object", with the prime focus being retailing of said
"black boxes". It ignores completely deep-linking into the ebook
(which is a non-retail application.) I believe if the more important
non-retail requirements were considered (such as deep linking), the
final recommendation would probably have been different.
This document also makes an attempt to turn ISBN into a pseudo-Work
identifier (ala ISTC), which violates the spirit, if not the letter,
of the ISO standard underlying ISBN. The letter of the "law" with
regards to ISBN is that each different ebook format (e.g., PDF vs.
LIT) of the same book must be given different ISBNs, since they are
different manifestations as the customer buys them. Retailers do
need to differentiate between a PDF and a LIT version for sale since
they are different objects each with their own DRM fees to pay,
possible selling price differences, etc., etc. And end-users *want*
a particular format as well. Instead, the AAP document assumes that
formats don't matter -- an ebook is an ebook and darn the format. As
far as I can tell, the AAP document above has not become the defacto
standard in the ebook world, at least in its entirety.
Any ebook distributor/retailer care to comment on this?
******
Here's a first stab at a written requirements list for the IRI scheme
to enable intra- and inter-publication linking to, within, and between
OpenReader Publications:
1) Purpose: To allow addressing by identifier (not name/path) of any
resource within an OpenReader Publication.
2) The OpenReader Publication as a whole is addressed by its own
identifier (pubid) defined in its Binder document.
[Note! The pubid can be different than the identifier assigned (if
any) to the encapsulation of that OpenReader Publication. See
example below illustrating why this is important to differentiate.]
3) When the internal resource is an XML Content Document, the scheme
will point to any element identifier (i.e., fragment identifier.)
Later on, it may be extended to support XPointer's element()
scheme.
4) The IRI path component (or fragments thereof) should "look" similar
to what is done in web practice today (http). It should not use
unfamiliar syntax, nor need to use a query component for basic
functionality.
5) It must support IRI (RFC 3987, http://www.ietf.org/rfc3987.txt ).
Important Definition:
OpenReader Publication: The collection of resources meeting the
requirements of the ORP Framework spec.
This definition is given because it is important to stress that an
OpenReader Publication is NOT the encapsulation/container file itself
(which encapsulates the OpenReader Publication.) Let me illustrate the
ramification of this vis-a-vis identifiers, ISBN's (for commerce
purposes), etc.:
Acme Publishing has produced two OpenReader Publications with
pubid's of ORP1 and ORP2, respectively.
Acme then sells ORP1 individually in a container given ISBN1, and
likewise sells ORP2 individually in a container given ISBN2.
Later on, Acme decides to wrap ORP1 and ORP2 into a single
container (e.g., called the "Collected Works of John Doe"), and
assigns ISBN3 to that container for sale. [Note that ORP1 and ORP2
maintain their independence -- they are just placed into the same
wrapper and sit side-by-side.]
This example shows it important why we must, in our considerations,
decouple the container (which is the focus of the AAP paper described
earlier) from the Publication itself.
For example, after ISBN1 (with ORP1) and ISBN2 (with ORP2) are sold,
various third parties (and possibly Acme Publishing itself) have
created extensive links into ORP1 and ORP2 (e.g., references from
other ebooks, bookmarks, annotations, etc.) When Acme Publishing
rewraps ORP1 and ORP2 into a single container ISBN3, these links will
remain unbroken since the links have targeted ORP1 and ORP2 and NOT
ISBN1 and ISBN2.
So whatever scheme we use for linking has to focus on the OpenReader
Publication and NOT on the container(s) that Publication will be
wrapped within. If we focus on the container (as the AAP paper
described earlier does), it becomes very impractical, if not
impossible, to build a robust, universal, permanent, machine-
enabled, ebook linking/annotation system. Ebooks will end up being no
better than p-books are now -- is this what we want for our ebook
future?
Jon Noring
For inspiration, and to get an idea of what it takes to register yet another URI
or IRI scheme, have a look at the 'info' URI scheme, which is designed to serve
as a general basis for identifying information assets:
https://datatracker.ietf.org/public/pidtracker.cgi?command=view_id&dTag=10863&rf\
c_flag=0http://www.loc.gov/standards/uri/info.htmlhttp://www.ietf.org/internet-drafts/draft-vandesompel-info-uri-04.txthttp://info-uri.info/
The IETF Tracker tells the story by itself; for a more vivid description, see
http://weibel-lines.typepad.com/weibelines/2005/11/for_your_info.html
I find the marriage of 'info' (for identification) and OpenURL (for resolution)
particularly interesting. OpenURL is quickly becoming the defacto standard for
open referencing in the academic publishing world.
If 'IRI' (RFC 3978) rather than 'URI' (RFC 3986) isn't critical, 'info' URIs
should fit the bill. A new IRI scheme would be competing in an already crowded
space. A somewhat dated (May 1998) survey of electronic publication identifiers:
http://hosted.ukoln.ac.uk/biblink/wp2/links.html
Wrt. the use of an identification scheme for components of ebooks, AAP has
published some guidelines that may serve as starting point for discussions:
http://www.publishers.org/digital/numbering.pdf
kind regards
Peter Ring
Magnus Informatik
a Woolters Kluwer business
> -----Original Message-----
> From: openreader-format@yahoogroups.com
> [mailto:openreader-format@yahoogroups.com]On Behalf Of Jon Noring
> Sent: 19. december 2005 19:45
> To: openreader-format@yahoogroups.com
> Subject: [openreader-format] The ORP IRI Scheme (IRI ==
> Internationalized URI)
>
>
> [For the overview, refer to message:
> http://groups.yahoo.com/group/openreader-format/message/310 ]
>
>
> Everyone,
>
> Another critical component of the OpenReader specification is to
> define an IRI scheme (where IRI is an internationalized URI per RFC
> 3987, see: http://www.ietf.org/rfc/rfc3987.txt )
>
> The purpose of the ORP IRI Scheme is to properly enable intra- and
> inter-publication linking (and resource association), into (from the
> outside world), within and between OpenReader Publications.
>
> The proposed ORP IRI Scheme is actually quite simple:
>
> orp:/pubid/resid[#fragment-id]
>
> { [#fragment-id] is optional }
>
>
> pubid: The unique identifier of the Publication, declared in the
> Binder.
>
> resid: The identifier of the target resource, such as a content
> document. The unique resid is given in the "manifest" in
> the Binder. It is associated with the resource name and
> optional path (more details when the Binder is discussed.)
>
> If the resid is a content document (XML), then an 'id' may be added
> to any element in the document with the given id, and referenced using
> the "fragment-id" (for fragment identifier) syntax. (The extension to
> XPointer schemes is obvious, particularly the useful element()
> scheme which may be supported at a future time.)
>
>
> Example:
>
> We have an OpenReader Publication with publisher-supplied unique ID of
> "myid1234567890". In the Binder, the publisher associated the content
> document resource "chapter1.xml" with the resid "chap1". In that
> content document resource there is a paragraph given the 'id' value
> of "para123" (i.e., <p id="para123"> ). We'd like to link to that
> paragraph from another publication or from the "outside" world:
>
> Link to the resource "chapter1.xml" itself:
>
> orp:/myid1234567890/chap1
>
> Link to the paragraph in "chapter1.xml":
>
> orp:/myid1234567890/chap1#para123
>
>
> Within content documents, it is assumed all IRI references (such as
> within the <a href=""> element), when not from another URI/IRI scheme
> such as "http", are ORP IRI references.
>
> Example:
>
> In "chapter1.xml" we have a hypertext link pointing to a paragraph
> (with 'id' "para456") in another content document in the same
> publication, named "chapter2.xml" with resid of "chap2". This link
> might take the form (using the Basic Content Document anchor element):
>
> <a href="chap2#para456">link to chapter 2</a>
>
> The full "orp:/pubid/" need not be prepended since it is assumed
> for a "local" link. Of course, "orp:/pubid/" is needed if linking
> into a publication from outside the publication, either from another
> OpenReader Publication, or from some application in the outside world.
>
> In addition to hypertext linking, the ORP IRI scheme may be used for
> other related purposes, such as associating objects such as external
> notes and bookmarks, etc.
>
>
> *****
>
> Of course, the obvious question is why the need for "resid"? Why
> not just address the actual resource name and path? That will be
> more fully explained when I outline the Binder document. But in
> short:
>
> 1) Gives greater flexibility for publishers to alter the names and
> paths of resources in publications without breaking existing links.
>
> (It may also occur when a publisher encapsulates two or more
> publications, and there is resource path/name clash, they may want
> to "move stuff around".)
>
> 2) In addition, a resource, such as a content document, can be used
> more than once in an OpenReader Publication. Each use will be given
> its own resid (that is, a resource can have multiple resid). This
> enables links to target the particular use of the resource in the
> publication and not to the "generic" resource itself.
>
> (Also note, as an aside, that by this technique one may apply
> different style sheets to the same content document depending upon
> how/where it is used in the publication when used multiple times in
> different places. This is one reason among several why all style
> sheet information is removed from content documents.)
>
>
> Comments, criticisms? Am I missing something critical in the
> syntax for the ORP IRI scheme?
>
> Jon
>
>
>
>
> ------------------------ Yahoo! Groups Sponsor
> --------------------~-->
> Get Bzzzy! (real tools to help you find a job). Welcome to
> the Sweet Life.
> http://us.click.yahoo.com/KIlPFB/vlQLAA/TtwFAA/2U_rlB/TM
> --------------------------------------------------------------
> ------~->
>
> ----------------------------------------------------
> Post a message: openreader-format@yahoogroups.com
> Unsubscribe: openreader-format-unsubscribe@yahoogroups.com
> Switch to digest: openreader-format-digest@yahoogroups.com
> Switch to normal: openreader-format-normal@yahoogroups.com
> Put mail on hold: openreader-format-nomail@yahoogroups.com
> Administrator: openreader-format-owner@yahoogroups.com
> ----------------------------------------------------
> Yahoo! Groups Links
>
>
>
>
>
>
>