[multiposted from rosetta-l]
G'day everyone
Do any of you know of TM systems that use a thesaurus in the matching
process? I know Wordfast uses a thesaurus for weeding out useless terms
while doing a term extraction, but it doesn't use the thesaurus when
doing fuzzy matching of translated segments, and I know of no other tool
that does it either. Do any of you know?
The idea came to me while watching Jost's "objective" videos of the
various CAT tools on his Jeromobot web site. Normally, the example
sentence "Die Katze ist schwartz" would be considered a 50% match of
"Der Hund ist weiss", and would therefore not be proposed by the TM
system. However, in English, the words Der and Die have (almost always)
the same translation, namely "The". If a thesaurus could tell the TM
system that Der and Die are functional equivalents or near-equivalents,
the match percentage would have been 75% (or 74%, if the thesaurus
triggers a 1% match penalty per word).
For translator who prefer to see as few matches as possible (I'm not one
of those), the process can also work in reverse, by telling the TM
system not to regard "The" as a matchable word when translating into German.
I suspect the ideal implementation of the above thing would be
user-created lists per language pair, so it's not a real thesaurus.
However, I can also see that a good thesaurus (with near-synonyms only)
could be useful in increasing possibly useful matches.
Does this concept ring a bell anywhere?
Samuel