via Google Blogoscoped by Tony Ruscoe on 6/9/09
Google Translator Toolkit is a
new tool being launched today to help translators organize their work and
benefit from shared translations, glossaries and translation memories, the Google China Blog
reports (English
translation by Google).
Evidence that Google was
working on a service like this originally surfaced in August 2008 when
references to Google
Translation Center appeared in Google’s robots.txt file. At the time, the
service was only available to Trusted Testers and most of the pages and
screenshots were quickly taken offline. Since those screenshots were produced,
it's clear that a lot of changes have been made to the tool.
The Translation Process

The
Google Translator Toolkit Workbench, showing side-by-side editing of
Wikipedia's Google article.
For those not familiar with
standard translation processes, a professional translator is likely to use a Computer-aided
translation (CAT) tool to help identify and extract snippets of text for
translation from various file types.
Google Translator Toolkit
currently only allows users to upload HTML, Microsoft Word, OpenDocument Text,
Rich Text and Plain Text documents up to 1MB for translation. Alternatively,
it's possible to enter the URL of a file on the web, select a Wikipedia article or a Knol for translation.
Once uploaded or selected,
files can be translated using the Workbench interface which shows the source
text and the target language translations either side-by-side or above and
below each other.

Previously
translated segments from the translation memory are suggested and can be rated
by yourself and others.
One good reason to share
translations with others is so that they can be reviewed for consistency and
style. Google allows users to rate translated segments, presumably for style
and accuracy. Comments can also be added to the target document, which is especially
useful when collaborating with other users.
Translation Memories

In
addition to the global translation memory, users can also create and share
their own TMs.
Many CAT tools allow the
translator to store their human translations in a database called a translation memory.
The memory can then be used to help with future translation projects by
checking to see whether a certain word, phrase, sentence or segment has been
translated before. Even if it's not exactly the same phrase, the translation
memory can be used to suggest what's called a fuzzy match, often indicated by a percentage
to reflect how similar the text is.
When translating Wikipedia
articles and Knols, the translations are stored in a global, shared translation
memory that's available to everyone by default. That means previously
translated phrases from these articles are stored and available for use by other
translators using the service, so if they ever find themselves translating the
same piece of text, Google will automatically populate the interface with the
previous translations to help save time.
Google's support article explains
the process:
Pretranslating your documents
When you upload a document into Google Translator Toolkit, we automatically 'pretranslate' your document as follows:
- We divide your document into segments, usually sentences, headers, or bullets.
- We search all available translation databases for previous human translations of each segment.
- If any previous human translations of the segment exist, we pick the highest-ranked search result and 'pretranslate' the segment with that translation.
- If no previous human translation of the segment exists, we use machine translation to produce an 'automatic translation' for the segment, without intervention from human translators.
We realize for some translators, pre-filling with machine translation may actually slow, not speed up, the translation process. In such cases, you can change your settings to pre-fill the segment with the source text, so you can type over the source text instead of making corrections to automatic translation.
Uploaded documents can
benefit from using this global TM too, but if users don't want to share their
translations with everyone, they can create their own translation memories and
control exactly which users can make additions and rate translations.
Translators already using CAT
tools may have translation memories stored in the Translation
Memory eXchange (.tmx) open standard XML format. Google allows translations
contained in those TMs to be uploaded and added to existing Google Translator
Toolkit TMs, providing they're no larger than 50MB and confirm to TMX 1.0 or
higher.
TMs other than the global TM
can also be searched for previously translated segments which can then be rated
without opening a translation document.
Glossaries
Glossaries are collections of
words and phrases with definitions and notes associate with them. They are
often used in the translation process to help choose which phrase is most appropriate
and to maintain consistency between translations of technical or specialty
subjects. Google Translator Toolkit requires CSV format glossaries to be
uploaded (it's not possible to create one from scratch) which will then be
automatically searched for terminology in the segments that are currently being
translated.
Learn More
For a really quick overview
of some of these features in action, you can watch this YouTube video:
How could this be useful to
Google?
A machine
translation of the Google China Blog explains, "Google's mission is to
organize the world's information and make it universally accessible and useful.
Translation of information, in our view is the key to access to
information."
Google has been working on a
statistical machine translation system for a few years now, which it started to
use for Google Translate instead of
Systran in October
2007. Since then it's been slowly integrating translation into many of its
services, including Google Toolbar, Google Talk, Google Reader, Gmail, and
YouTube. There's even an AJAX
Language API which anyone can use to build upon.
In my opinion, this latest
tool has clearly been designed to help improve Google's translation offerings.
One thing on which statistical machine translation relies is aligned translations. In
very simple terms, to help train a statistical machine translation system, text
in one language is fed into the system alongside the same text in another
language. Will enough text, the system can start to learn how certain phrases
should be translated. Without aligned translations, there's no easy way to know
exactly which sentence in the source document relates to the translated
version. That's where translation memories are very useful; they contain
aligned translations.
There are literally thousands
of Wikipedia articles being translated all the time, but the translations
aren't usually maintained in a translation memory. Through using Google
Translator Toolkit, translators could benefit from seeing previously translated
text from the global translation memory and, in return, Google could clearly
benefit from translators using its interface to translate any content that's
then stored as aligned translations in their global TM, which it can ultimately
use to enhance its statistical machine translation system and improve the
translations that are provided to end-users of any service using Google
Translate.
And as the global TM grows,
it might even be possible for end-users to get near-to-human-quality for
translations of their documents, websites, blog posts, emails and tweets
instantly.
[Thanks TOMHTML!]
Disclaimer: I am an employee
of SDL, a translation company that provides
translation services and software.
[By Tony Ruscoe |
Origin: Google
Translator Toolkit | Comments]
[Advertisement] Google books at eBay:
background info on Google, AdWords, AdSense, Blogger and more...
Things you can do from here:
- Subscribe
to Google Blogoscoped using Google Reader
- Get started using Google
Reader to easily keep up with all your favorite sites
__________ Information from ESET NOD32 Antivirus, version of virus signature database 3811 (20090129) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com