Bacchus BG wrote:
> I look for BLEU rates of the following language pairs:
> English-Spanish and Spanish- English for "Google translate" and
> "Bablefish".
It would depend on the reference texts, wouldn't it? The questions is,
where can you find exact translations in EN-ES and ES-EN for free? The
Wikipedia versions are often not exact translations. But... at ProZ.com
and possibly other translator portals they offer users the ability to
put samples of translation up on their profiles. This may be a place to
mine exact translations, to be used as reference texts in BLEU analysis.
You could do similar analysis by aligning the reference translations
into TMs, then reverse the TMs, and then performing CAT translation on
the machine translation target text. This will not be a BLEU analysis
but it will be a non-programmer's solution to evaluating machine
translation in a way which is remotely similar to BLEU (since it would
result in a percentage count).
In a translated Wordfast document, the match percentages are contained
in the document itself, so it would be possible to do lots of text in a
short time. In OmegaT, you can set a minimum match threshold for
automatic match insertion, but if you want to note individual match
percentages, you'd have to sit there and watch the screen in real time.
Some Gettext tools offer fuzzy translation with a customisable match
percentage, so you could run such an operation several times with
different fuzzy thresholds and count the files using pocount to see the
overall percentages for large numbers of files.
Samuel