Friends,
In response to Lee's comments I threw out a "fishing net" and pulled in a small
but varied catch.
1. Aaron Swartz wrote of "the work of the Internet Archive" Is that what this
list is meant to be about? I had completely forgotten about it, but by plugging
in www.archive.org into my web browser, was reminded that it was here that I
joined this list! I notice also that the site still says that since 1998 they
have only been collecting ASCII text. Is that really still the case?
Aaron asked if we should "focus on more specialized archives rather than trying
to archive the entire Web". Indeed my hope was to elicit help from other people
in how to archive an extremely specialised subset of electronic documents (about
or mentioning the Baha'i Faith) with extremely limited resources - only a part
of my job, and just me with one lowly PC attached to a network, as part of a
total library staff of 15 people.
2. Deborah Woodyard pointed out Pandora at the Australian National Library. A
fascinating effort that I had indeed missed (and me being Australian too - I'm
terribly embarrassed! - obviously not doing enough homework). Since we consider
ourselves a depository library (albeit voluntary), the documents on this site
will be of great help.
A question Deborah if I may: How many people are employed in that project, what
is its budget, and is there a measure of volume - i.e. pages saved per day or
something? If this is all on the site somewhere, feel free to kick me in the
right direction...
3. Bob Mulrenin spoke of XML formats and database server softwares that went
completely over my head - thus pointing out a yet another huge gap in my
knowledge that I need to fill. Since what I'm working on is really a pilot
project, I welcome the hint that there is a totally different way of going about
it all - and look forward to learning more about it.
4. Harry Verwayen responded with comments that made it sound that the scale of
his operation may be closer to what I'm doing. He also spoke of EAD standards
(mentioned also by Bob above) which I need to now investigate and attempt to
understand.
5. Henry Gladney replied to me directly (but I hope that was only a result of
the "bug" and that he will forward it to the list) pointing out the problem of
"bleeding edge" technologies leaving "nothing you are currently saving ...
accessible to anyone". He also pointed me to a PDF file of a presentation he
gave earlier this year at
http://www.almaden.ibm.com/cs/people/gladney/100Year.pdf. His reccommendation to
just print out whatever we want to save still seems to be the only really
sensible answer - and one that we have been doing - even if one does miss much
of the "experience" once it is printed.
So from the National Library of Australia, to IBM research centers, to me at my
desk - quite a range in such a small number of answers, and a reminder of the
enormity of the task.
From down here in the trench, I think I can only say that I'm grateful that
greater minds than mine are onto the problem. I should continue my pilot project
for my organization, and with any luck the huge wall of backward compatibility
of software, firmware and hardware will somehow disolve as we approach it due to
the work of those greater minds (and bigger - but no where near big enough -
budgets).
warm regards,
Bryn Deamer
_______________________________________________
Archivists mailing list
Archivists@...
http://www.archive.org/mailman/listinfo/archivists