All:
Might this also be a situation where Napster's peer-to-peer technology could
do a lot of public good? Somehow have a site where individuals could
"register" as the steward for a particular time period (of any length) with
appropriate metadata that would be searchable?
I don't know the technical details on how to get from here to there, but
just an observation/suggestion.
-Tom Johnson
===============================================
J. T. Johnson
Institute for Analytic Journalism
tom@... www.jtjohnson.com
505.577.6482[o] 415.824.3521[h] 505.577.6482[c]
JAGIS http://groups.yahoo.com/group/JAGIS-L
===============================================
-----Original Message-----
From: Brewster Kahle [mailto:brewster@...]
Sent: Sunday, January 26, 2003 11:38 AM
To: archivists@yahoogroups.com; archivists-talk@yahoogroups.com
Cc: Simon Carless
Subject: [archivists] Internet Archive CD-ROM archive - proposal from a
volunteer
[Simon visited the Internet Archive on Friday to start as a volunteer on the
CDROM collection.
http://www.archive.org/cdroms/cdroms.php
This collection is in need of love and care, and Simon is very interested in
helping.
I send on his note (with less appropriate parts yanked) to illicit comment
and suggestions. I suggest you write to archivists-talk@yahoogroups.com
or straight to simon.
-brewster]
---
INTRODUCTION
This proposal deals with the best way to archive the CD-ROMs in
the Internet Archive's Macromedia collection. The collection
comprises many thousands of CD-ROMs of PC, Mac, and PC/Mac
format, mainly made between the years of 1994 and 2000.
Although a number of people online are (unofficially) archiving
console software and game ROMs, nobody is making sure there
are perfect digital copies and databases of the PC/Mac CD
'multimedia' boom and bust of the early and mid 90s. This is a
_vital_ pre-broadband era where some of the first widely available
ideas of 'virtual reality' and cinema-quality 3D graphics for the home
were being explored (see 'Myst'!).
Although the Internet has now superceded a lot of the multimedia
ideals the Macromedia collection stands for, that's precisely WHY the
collection is important - as a document of what the era stands for. As
an added impulse, the collection is stored on decayable CD media, and
it's not strictly clear how long it will be until these discs will
lose their reflective surfaces and become unplayable (some people
claim 10 to 25 years!)
Making copies of the discs and their artwork now and storing them in a
searchable database will help current and future historians of the
era, and making the most interesting and relevant material available
for download (with the full permission of the copyright holders!) will
make people who love abandonware and free software VERY happy.
1. CD-ROM ARCHIVE FORMAT
The first important decision is how best to archive the discs as an
exact copy, and then how best to distribute them to the public and use
them in other ways.
The official FAQ for the newsgroup alt.binaries.cd.image
recommends using an .ISO format for a CD that has one data-only
track, and a .bin/.cue format for a Mode 2/Mixed Mode CD - ie, one
that has a data track and multiple audio tracks. Another possibility
is that the program is simple enough that the files could be extracted
directly to hard disc from, say, a .ZIP file, and they would still
run.
So this leaves us with 3 possibilities:
.BIN/.CUE - a 'perfect' digital copy of the disc. Needs to be burnt to
disc before it will work, however. .ISO or .ISO/.WAV - a copy of the
disc that should be perfect if there are not any exotic copy
protection or multiple audio files also on the disc. you can handle
audio files as WAVs alongside ISOs, but re-burning them might be
confusing. .ZIP - a zipped-up version of the files contained on the
disc.
These formats all have their advantages and disadvantages. I
personally think we should discount .ZIP as a format because:
1. It's fairly easy to run ISOs as virtual CD-ROM drives on the PC -
there's a simple setup for it. This will mean that we're really
providing the CD-ROM 'as is' if we provide an ISO - it's a fairly pure
version of the original disc which may also pass security checks to
see if the CD-ROM is present.
2. It's also possible to extract files from ISOs easily with the
Isobuster utility on PC. So if people don't like having virtual CD-
ROM drives, they can just extract the files that way.
3. I wouldn't think .ZIP deals with dual PC/Mac format discs well at
all, whereas .BIN/.CUE _should_, and .ISO _might_ - hah!
So .ISO is a good format, but I'm not sure it deals so well with
multiple audio tracks. So my temptation right now would be:
- .BIN/.CUE for the 'master' copy of everything.
- .ISO for any CD that only has one data track.
- _maybe_ .ISO and .WAV for CDs with extra audio tracks - we
need to research how easy it is to emulate and re-burn these.
We are, unfortunately, creating twice the data this way, though.
There's some Mac issues that need working through, but Macs can
burn .ISO without any trouble, and Toast for Mac can burn
.BIN/.CUE. Need to make sure backing up an .ISO from a PC won't
negate the Mac-compatible bits of the disc, mind you - some
testing needed.
[Multiple audio tracks are definitely an issue with a minority of the
Macromedia collection, by the way, because CD-quality audio was one of
the main draws of multimedia at that time, so many applications played
music from the CD drive whilst the program was running.]
2. CD-ROM ARTWORK FORMAT
Eventually, scanning the entire manuals for posterity is deserved,
time and funds permitting. Since we have a smaller amount of both for
now, scanning the front and back covers of the CD-ROM and making all
of them available online (whether the file image is available for
download or not) would do a LOT to enhance the visual nature and
attractiveness of the collection, especially for those titles that
can't be downloaded.
So the suggestion for artwork for now is the front and back covers of
the CD packaging _OR_ CD case only at the following sizes:
- master offline image - .TIFF at very high scan quality, will never
be posted on the website. - master online image - .JPG at size which
enables you to read all text. You'll get this when you click on the
thumbnails on the website. - thumbnail online image - .JPG at small
size, as with current thumbnails showing on site.
3. 'MACROMEDIA COLLECTION' CD-ROM ARCHIVE CONTENTS
It's important to recognise that ALL of the CD-ROMs in the
collection are important. But equally, with such a large amount of CDs
to sort through, I think the collection should be prioritised into
three different areas.
1. PRIORITY - these CD-ROMs should be dealt with first, because
they offer information that's not available elsewhere (a museum CD-
ROM about totem poles, for example), they're good examples of
multimedia from the time (an educational adventure about dinosaurs),
or they're good pieces of cultural ephemera (the Betty Ford Clinic
promotional CD-ROM or the 'magazine on a disc' ventures.)
2. NON-PRIORITY - these CD-ROMs are still important and should
be dealt with when time and funds permit, but they either contain
information that is NOT media rich (simple training programs which
would be shown on webpages nowadays) or don't have the CD- ROM as its
main focus (a music album with a small amount of added multimedia
content).
3. JAPANESE-LANGUAGE - I suspect these discs should be
separated out, because we need to look at compatibility issues
with backing up (can you backup Japanese-language discs if you
don't have J-Win installed?) and playing issues (do you need J-Win to
run these discs?) If we can work out compatibility problems, we can
then prioritise them into one of the two categories above.
4. 'INTERNET ARCHIVE COLLECTION' CD-ROM ARCHIVE
CONTENTS
There is probably a new collection, which will at first be VERY
small, which could be called the 'Internet Archive Collection', since
that's who will be assembling it. The point of this is - when we come
out and (re)launch the site, there needs to be at least SOME
multimedia CD-ROM stuff on there to download that people will get
excited about. Some of this may be cherry-picked from elsewhere than
the Macromedia archive. RIght now I'm particularly thinking of:
1. Voyager Company titles - this was the CD-ROM part of the well-
known Criterion Collection laserdiscs and DVDs. I think a company
called Learntech owns the rights to the Voyager Company titles right
now, but we should definitely find out about whether this would be
possible. 2. Cyan titles - the earlier pre-Myst titles from Cyan like
'Cosmic Osmo' and 'Manhole' are resoundingly out of print. Got to be
worth a try. 3. 'Total Distortion' from Joe Sparks and Pop Rocket - a
classic proto-multimedia release from the guy who has now gone on to
create Devildoll and Radiskull for Shockwave :) 4. 'Starship Titanic'
interests me a lot, but I have no idea whether that would be a
possibility. It was Douglas Adams' last CD-ROM project and is now out
of print.
The rights issues for some of these are definitely problematic,
though. Which brings me on to..
5. MAKING CD-ROMS REMOTELY ACCESSIBLE
I know this was one of the original goals of the project, and I've
been looking a little at the technical issues. The problem definitely
seems to be that most of this multimedia CD-ROMs play audio and video
files, and I just don't see a possibility of them streaming properly
over a normal broadband network with a VNC-like 'PCAnywhere' piece of
software running. Looking at messageboards, people are having
significant trouble just over their LAN. Simple Director-authored
things with easy animations and links might work ok, but that's not
necessarily where the meat of the interest in the collection lies,
imho.
But it's CERTAINLY worth doing LAN tests to see if things will
behave properly, with a view to making machines remotely
accessible over either broadband (slowly) or Internet2 (quicker!) if
other issues with security and suchlike can be resolved. If this could
work, it would rock :)
Simon Carless
Jan.26th 2003
To unsubscribe from this group, send an email to:
archivists-unsubscribe@yahoogroups.com
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/