Kavan Ratnatunga wrote:
> Making the text accessible to the Blind I felt was another important
> reason to put books on line as text. It is a pity that not much
> research to improve text OCR from about 98% to the 99.99% accuracy
> to make it practical to OCR the volumes of scans that are now
> available online as text.
I don't know what alphabet these texts use, but my experience is that
the ABBYY FineReader (www.finereader.com) OCR software gives very good
results with Swedish text. But even if you get 99+ % accuracy, there
can always be one more error left. The best system in my experience
is to publish *both* the page image and the OCR text, and allow
readers to submit corrections (to proofread) directly over the
Internet.
You are welcome to help me and Project Runeberg to proofread this 1906
encyclopedia article on Ceylon, http://runeberg.org/nfbe/0020.html
Just scroll down below the image, and you will find the OCR text and a
link that allows you to enter the proofreading mode. This page is in
Swedish, but most of the user interface is in English. And all a
proofreader needs to do is to make sure that the text corresponds to
the page image.
All submitted edits are stored under version control, so there is no
risk you can erase or damage our collection. We have 100,000 pages
like this online and only 100 submitted edits per day, so we need more
proofreaders. The 300 most recent changes are listed on
http://runeberg.org/rc.pike
--
Lars Aronsson (lars@...)
Project Runeberg - free Nordic literature - http://runeberg.org/