Lee,
Until your email I'd forgotten the list even existed.
Perhaps just to start something I'll describe a little bit of what our library
is trying to do in the way of internet archiving - and the way I do it. This may
bring forth some comments - like - "That's crazy - why don't you.... whatever.."
:) Any suggestions will be gratefully received.
The Baha'i World centre library (http://library.bahai.org ) is attempting to
maintain a complete a record of the development of the Baha'i Community as
possible. All Baha'i communities and publishers are requested to deposit copies
of their publication with the Library. We also try to collect any mention of the
Baha'i Faith in any source whatsoever.
Since 1989 we have been receiving a few community or association newsletters by
email. Until 1999 when the World Centre changed to PC's with NT, we were using
UNIX in a character based environment. Thus such newsletters were saved to
sub-directories of UNIX. More and more such items were being received by the end
of the decade. We used PINE to save the files - many of which we could not even
read at the time.
However with the change to PC's and NT during 1998/9, a whole new world opened
up.
The Library was granted space to create an INTERNAL Web page. This has been
done, with one section being the "Electronic Collection". To this is saved any
electronically received item we can. At the moment there are 8 parts:
1. Annual and Convention reports of the National Baha'i Communities
2. Electronic Baha'i newsletters
3. Electronic Commercial and Scholarly journals
4. Miscellaneous
5. Online news source archive
6. Radio and TV Transcripts
7. Theses and Dissertations
8. Web page archives.
The goal is to try to save whatever we get in a way that future users will be
able to experience what current users experience. This often means going into
the code or program and changing it so that is viewable in the available suite
of Microsoft tool. Thus there is some reduction of the archival integrity in
favour of the final product looking as much as it did originally as possible.
The items saved come from either emails or the Web.
Within emails, the items may arrive in HTML form, Word Processed attachments
(Word, Word-Perfect etc), plain ASCII text or some hybrid.
They are all saved to the networked drive. HTML emailed files are then opened in
Internet Explorer from the network drive and edited with Notepad to ensure that
all the links work in the local environment. They are then imported into the
Library's Web using FrontPage 2000. Non-HTML files are examined to see if they
will be opened by Internet Explore with some semblance of their original look.
Word files are simply opened, but some text files do get messed up. I do
whatever I can to try to save their original look as much as possible -
sometimes converting them to HTML, sometimes not. I then import those to the
Libarchive web site and create the links.
Thus the items exist in two places - on the network drive, (sometimes in both
the original format and a local copy but we have not been consistent about
this), and in the Intra-web.
Initially, the Library only had one Web but after about 3 months of building the
Web we suddenly found that all the HTML pages had been "themed" by Front-page.
This is sort of like taking the Mona-Lisa and changing its background and
colours!
Thus a second web page - Libarchive was created in which no themes are used. The
advantage of having saved the items on the Network drive became apparent, as we
were able to re-import them all. The Libarchive web has no home pages. These
are all created on the major Library Web, get themed appropriately, then point
to the unthemed Libarchive site.
Archiving the Web items is also very challenging.
For single pages, we usually can simply save the page in Internet Explore, which
nicely saves the associated images etc, in a sub-directory associated with the
file. These are then imported into the Archives web and linked to from the index
pages in the main web.
This is mainly how we archive any reference to the Baha'i Faith in online-new
sources and other web sites.
For entire web sites, I have been using a program called Web-stripper. However,
I have not yet fully come to terms with it, and sometimes find myself either
saving too much or too little. Further, updating the web page is problematic.
We don't always want to overwrite the existing pages, since they can be
considered an earlier edition, but we don't want to duplicate the entire site.
Thus far we have got around it by first renaming the original sites-home page
saving any image files associated with it to another part of the Web before
updating the entire site. That way we can save a snapshot of what the site used
to look like at different parts of its history.
It is understood that many of the links within a file lead to external pages and
will degrade overtime. It is also understood that many of the advertising links
still continue to be linked to and external site and will continue to show new
and different advertising to what is seen originally - until such time as that
link dies and an empty space is left.
After some 6 months experimentation the Libarchive web has some 7,511 files, of
which (if I understand the FrontPage Site summary) 2,903 are pictures.
At the moment all of this is only visible within our organization - the Baha'i
World Centre and does not exist on our Public Web Site. Various decisions need
to be made before the collection, or parts of it, could be made public.
So that's it in a nutshell. Does it sound anything like what somebody else is
doing out there?
warm regards,
Bryn Deamer
Electronic Information Systems Librarian
Baha'i World Centre Library
xlib@...
http://library.bahai.org
_______________________________________________
Archivists mailing list
Archivists@...
http://www.archive.org/mailman/listinfo/archivists