You have discovered us! Yes, it is the Internet Archive which hosts this
discussion list. For those who are not familiar with us, please go to
www.archive.org for details, but here is some top line information:
We are located in the Presidio of San Francisco, have been in existence for
4 years, but active for 1. We are trying to create a coherent and thorough
archive of the Internet to be made available for free to researchers and
scholars. Our current collection is indeed primarily HTML text; however, we
have just begun to collect Web images and, moving forward, will collect
other sorts of Internet files.
Anybody can get an account with us which allows access to the archives
(http://www.archive.org/proposal.html), but please be aware that our
technological development is still underway. In practical terms that means
the collections are accessible in the UNIX environment only and are in large
flat files that take some computer science knowledge to use. Over the course
of the next year, we will be developing tools which will enable
non-computer-science researchers to use the material.
In terms of providing spiders for others to create specialized crawls: that
is a great idea. We're not able to do that yet, but is something we have
discussed. We are interested in receiving data donations as long as we can
make them available for free to our users. If someone wanted to set up a
partnership in which Partner A provided the crawler to Partner B to run and
we save and serve the resulting data, we would be interested in working that
out. We would also be interested in running additional crawlers that someone
else (Partner A) provides.
I hope this clarifies and provides more food for thought and collaboration.
Cordially,
Marlita Kahn
Managing Director, Internet Archive
415-561-6802
-----Original Message-----
From: archivists-admin@...
[mailto:archivists-admin@...]On Behalf Of Aaron Swartz
Sent: Monday, July 24, 2000 8:39 AM
To: archivists@...
Subject: Re: [Archivists] A variety of fish in my net!
Electronic Information Systems Librarian <xlib@...> wrote:
> 1. Aaron Swartz wrote of "the work of the Internet Archive" Is that what
this
> list is meant to be about? I had completely forgotten about it, but by
> plugging in www.archive.org into my web browser, was reminded that it was
here
> that I joined this list! I notice also that the site still says that
since
> 1998 they have only been collecting ASCII text. Is that really still the
case?
>
> Aaron asked if we should "focus on more specialized archives rather than
> trying to archive the entire Web". Indeed my hope was to elicit help from
> other people in how to archive an extremely specialised subset of
electronic
> documents (about or mentioning the Baha'i Faith) with extremely limited
> resources - only a part of my job, and just me with one lowly PC attached
to a
> network, as part of a total library staff of 15 people.
Well, the website says the list is for "discussion on Internet libraries"
which is rather broad. Perhaps the Internet Archive could work out a
distributed system allowing people like you to work on smaller subsets
(Baha'i) of the Web and contribute your work to the archive. The archive
could provide you with the tools and technologies to spider and store the
information you'd like, and in return you could provide them with the data.
However, I haven't yet heard from anyone at the archive on this list, so I
don't know how feasible this is.
--
Aaron Swartz |"This information is top security.
<http://swartzfam.com/aaron/>| When you have read it, destroy yourself."
<http://www.theinfo.org/> | - Marshall McLuhan
_______________________________________________
Archivists mailing list
Archivists@...
http://www.archive.org/mailman/listinfo/archivists
_______________________________________________
Archivists mailing list
Archivists@...
http://www.archive.org/mailman/listinfo/archivists