My son put a few books on Sri Lanka history online at my website
http:/lakdiva.org/
and instead of just putting images of the pages of the book did OCR
and put it online
in plain html text as with the Gutenberg project.
The motivation for text was to reduce download time which is an
important
consideration especially in the developing world and the ability to
keyword
search the text.
Last month I was in my homeland of Sri Lanka and met Prof Weerakody,
of the
Western Classics dept of the University of Peradeniya. He thanked me
for
putting the "Mahavamsa" (The great Chronical of Lanka) online as
text since
for the first time he can have the computer read it to him. He is
totally Blind.
Making the text accessible to the Blind I felt was another important
reason
to put books on line as text. It is a pity that not much research to
improve
text OCR from about 98% to the 99.99% accuracy to make it practical
to
OCR the volumes of scans that are now available online as text.
Coming from an automated computer astronomical image processing
background
I feel there is a lot of scope for improvement of current OCR mostly
developed
over a decade ago. I would like to contact research groups that may
be working
on improving OCR.
Thanking you
with best regards
Kavan U. Ratnatunga Department of Physics, WEH-8305
Senior Research Scientist Carnegie Mellon University
kavan@... 5000, Forbes Avenue
http://makara.phys.cmu.edu/~kavan Pittsburgh, PA. 15213-3890
Work: 1-412-268-1888 Fax: 1-412-681-0648
On 2003.06.06 09:19
archivists@yahoogroups.com wrote:
> From: brewster kahle <
brewster@...>
> Clearly, the idea of the bookmobile - putting the public domain into
> action by creating books - has potential, especially in the
> developing world.
>
> -brewster
> Digital Librarian
> Internet Archive
>