Thanks Trond,
The spell checker we are developing is based on Hunspel (in Open Office). There are number of ways in Hunspel to get around this problem and we have been experimenting with them. Kindly give me some more information on the platform you are working in and how the relaxer is implemented.
Cheers
Tunde
> From: trond.trosterud@...
> To: a12n-collaboration@...
> Subject: Re: [A12n-Collab] Re:Tech support for Yoruba orthography
> Date: Mon, 29 Dec 2008 21:14:21 +0100
> CC: k.lawal1@...; odutola@...; adenuga@...; eadagun@...; ayschlei@...; dotogundeji@...; info@...; andrewc@...; sany@...; alukome@...; afonrereyoruba@...; bomiola@...; jkolupona@...; awoyale@...; info@...; lokehinde@...; toyin.falola@...; bamgbose@...; africanoracle@...; roposek@...; ilesanmi@...; matto1@...; walter@...; tundeojo@...; babaloba@...; oyegbola@...; oosasona@...; molarawood@...; jfakinlede@...; fomidire@...; mokome@...; valojo@...; sadef@...; profadelugba@...; osoriyan@...
>
>
> Tunde Adegbola kirjoitti 27. des. 2008 kello 11.05:
>
> > (...) another flavor of the same problem in my efforts in developing
> > a Yoruba spell checker. Even when the glyphs look right, different
> > sequences of application of the diacritics are seen by the computer
> > as different spellings.
>
> One way of solving that is to enrich your spellchecker with a spell
> relaxer. For some Sámi languages, there is a habit of writing æ, ø
> in Norway and ä, ö in Sweded, furthermore, ä, ö may in texts be
> either precomposed or composed as a, o + combining ¨. We have an
> (xfst) file containing lines like the following (ä, etc. are seen as
> facultative realisations of æ etc.):
>
> æ (->) ä , æ (->) ä , Æ (->) Ä , Æ (->) Ä
>
> Details will vary between different spellcheckers, but this should not
> be a problem
>
> > 2. Regarding input, a standard layout is very important. In
> > addressing this issue however, we must take note that it is not
> > enough to be able to realize theses characters. The question of
> > typing speed and efficiency is also relevant.
>
>
> Yes, careful thought should be put into keyboard design, and different
> alternatives should be thoroughly tested by language users. This issue
> is unfortunately as actual today as it was four years ago, when I
> posted the following set of principles for keyboard layout to this
> list (see below):
>
> Here are some keyboard examples, Northern Sámi has 7 non-ascii
> letters, and Skolt Sámi has 15, and they may thus exemplify medium-
> and large-size alphabets. My native Norwegian keyboard (3 non-ascii)
> then exemplifies a small-size alphabet. Moral: Everything is possible,
> but every key should be carefully planned.
>
> Trond.
>
> Extract from my posting of 27.10.04:
>
> 1. Designing a keyboard is an important issue, and every key, every
> letter placement should be carefully evaluated before placement.
>
> 2. Keyboard design is a conservative enterprise. Typing habits sit in
> the fingers of the users. Before a new keyboard is made, (all) older
> existing keyboard layouts should be presented (are there typewriters
> from colonial times, what is in use now, which layouts are
> dominating?) and evaluated. When there are several conflicting
> computer and typewriter layouts in use, the conservative factor should
> be taken into account (don't change anything until there are good
> reasons to do so).
>
> 3. Keyboards should take the keyboard of the dominant colonial
> language as a starting point. Thus, when a language (say, Fulfulde),
> is written both in Francophone and Anglophone countries, it will
> probably need both a qwerty and an azerty keyboard, since its users
> will be used to qwerty and azerty, respectively. Everything else on
> the keyboards should be as similar as possible.
>
> 4. A statistical survey of the language in question should be
> undertaken, in order to get a frequency list of the different letters.
> If some non-a-z letters (e.g. q, w, x, y) are not part of the
> orthography in question, whereas e.g. hooked letters (É“, É—, É ) are
> not, one should consider having common non-a-z letters on the position
> of unused a-z ones. As for the hooked letters, they may be placed
> under (SHIFT+)option-b,d,g as well, but this is the type of
> considerations that must be made, in light of statistics.
>
> 5. Keyboard standardisation, that is, getting one standard, or as few
> as possible, is important. Users don't want to learn good touch typing
> habits, just to learn that their next work place or next computer uses
> a different layout. On the other hand side, private variation is not
> harmful, as long as it is kept private: a user may make his or her own
> idiosyncratic keyboard, and still we all can read the output. Stubborn
> protesters in a minority position should thus just be left in peace
> with their own layout. Private code tables, on the other hand, are
> impossible.
>
> 6. During the keyboard standardisation phase, it is a good idea to
> make different keyboard layouts (just as has been done on this list),
> and test them out on skilled typists.
>
> 7. Finally, the job isn't done until all the non-letter characters are
> placed as well. Tone marks will be made by dead keys, and since
> languages do have more letters than a-z, non-letter symbols will have
> to be shuffled around. Here, inspiration could be taken from other
> languages that have been through the same process. It would also be a
> good idea to try to achieve African unity, and do it the same way as
> the neighbour did, unless there are reasons to do otherwise.
>
> Keyboards are thus preferably made by paper, pencil, and careful
> thoughts. People to implement them are of course needed - and welcome
> - but the process of deciding what to implement is long, and needs the
> input from both linguists, localisers, and experienced typists.
>
> ----------------------------------------------------------------------
> Trond Trosterud t +47 7764 4763
> Institutt for språkvitskap, Det humanistiske fakultet m +47 950 70140
> N-9037 Universitetet i Tromsø, Noreg f +47 7764 5216
> Trond.Trosterud (a) hum.uit.no http://www.hum.uit.no/a/trond/
> dn------------------------------------------------------------------đŋ
>
>
>
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! Try it!
The spell checker we are developing is based on Hunspel (in Open Office). There are number of ways in Hunspel to get around this problem and we have been experimenting with them. Kindly give me some more information on the platform you are working in and how the relaxer is implemented.
Cheers
Tunde
-----------------------------------------------------------------------------------------------
Tunde Adegbola (Ph.D.)
Executive Director
African Languages Technology Initiative
(Alt-I ... Inserting African issues into the agenda of the knowledge age)
11 Oluyole Way, New Bodija Ibadan, Nigeria.
+234 8034019398
------------------------------------------------------------------------------------------------
> From: trond.trosterud@...
> To: a12n-collaboration@...
> Subject: Re: [A12n-Collab] Re:Tech support for Yoruba orthography
> Date: Mon, 29 Dec 2008 21:14:21 +0100
> CC: k.lawal1@...; odutola@...; adenuga@...; eadagun@...; ayschlei@...; dotogundeji@...; info@...; andrewc@...; sany@...; alukome@...; afonrereyoruba@...; bomiola@...; jkolupona@...; awoyale@...; info@...; lokehinde@...; toyin.falola@...; bamgbose@...; africanoracle@...; roposek@...; ilesanmi@...; matto1@...; walter@...; tundeojo@...; babaloba@...; oyegbola@...; oosasona@...; molarawood@...; jfakinlede@...; fomidire@...; mokome@...; valojo@...; sadef@...; profadelugba@...; osoriyan@...
>
>
> Tunde Adegbola kirjoitti 27. des. 2008 kello 11.05:
>
> > (...) another flavor of the same problem in my efforts in developing
> > a Yoruba spell checker. Even when the glyphs look right, different
> > sequences of application of the diacritics are seen by the computer
> > as different spellings.
>
> One way of solving that is to enrich your spellchecker with a spell
> relaxer. For some Sámi languages, there is a habit of writing æ, ø
> in Norway and ä, ö in Sweded, furthermore, ä, ö may in texts be
> either precomposed or composed as a, o + combining ¨. We have an
> (xfst) file containing lines like the following (ä, etc. are seen as
> facultative realisations of æ etc.):
>
> æ (->) ä , æ (->) ä , Æ (->) Ä , Æ (->) Ä
>
> Details will vary between different spellcheckers, but this should not
> be a problem
>
> > 2. Regarding input, a standard layout is very important. In
> > addressing this issue however, we must take note that it is not
> > enough to be able to realize theses characters. The question of
> > typing speed and efficiency is also relevant.
>
>
> Yes, careful thought should be put into keyboard design, and different
> alternatives should be thoroughly tested by language users. This issue
> is unfortunately as actual today as it was four years ago, when I
> posted the following set of principles for keyboard layout to this
> list (see below):
>
> Here are some keyboard examples, Northern Sámi has 7 non-ascii
> letters, and Skolt Sámi has 15, and they may thus exemplify medium-
> and large-size alphabets. My native Norwegian keyboard (3 non-ascii)
> then exemplifies a small-size alphabet. Moral: Everything is possible,
> but every key should be carefully planned.
>
> Trond.
>
> Extract from my posting of 27.10.04:
>
> 1. Designing a keyboard is an important issue, and every key, every
> letter placement should be carefully evaluated before placement.
>
> 2. Keyboard design is a conservative enterprise. Typing habits sit in
> the fingers of the users. Before a new keyboard is made, (all) older
> existing keyboard layouts should be presented (are there typewriters
> from colonial times, what is in use now, which layouts are
> dominating?) and evaluated. When there are several conflicting
> computer and typewriter layouts in use, the conservative factor should
> be taken into account (don't change anything until there are good
> reasons to do so).
>
> 3. Keyboards should take the keyboard of the dominant colonial
> language as a starting point. Thus, when a language (say, Fulfulde),
> is written both in Francophone and Anglophone countries, it will
> probably need both a qwerty and an azerty keyboard, since its users
> will be used to qwerty and azerty, respectively. Everything else on
> the keyboards should be as similar as possible.
>
> 4. A statistical survey of the language in question should be
> undertaken, in order to get a frequency list of the different letters.
> If some non-a-z letters (e.g. q, w, x, y) are not part of the
> orthography in question, whereas e.g. hooked letters (É“, É—, É ) are
> not, one should consider having common non-a-z letters on the position
> of unused a-z ones. As for the hooked letters, they may be placed
> under (SHIFT+)option-b,d,g as well, but this is the type of
> considerations that must be made, in light of statistics.
>
> 5. Keyboard standardisation, that is, getting one standard, or as few
> as possible, is important. Users don't want to learn good touch typing
> habits, just to learn that their next work place or next computer uses
> a different layout. On the other hand side, private variation is not
> harmful, as long as it is kept private: a user may make his or her own
> idiosyncratic keyboard, and still we all can read the output. Stubborn
> protesters in a minority position should thus just be left in peace
> with their own layout. Private code tables, on the other hand, are
> impossible.
>
> 6. During the keyboard standardisation phase, it is a good idea to
> make different keyboard layouts (just as has been done on this list),
> and test them out on skilled typists.
>
> 7. Finally, the job isn't done until all the non-letter characters are
> placed as well. Tone marks will be made by dead keys, and since
> languages do have more letters than a-z, non-letter symbols will have
> to be shuffled around. Here, inspiration could be taken from other
> languages that have been through the same process. It would also be a
> good idea to try to achieve African unity, and do it the same way as
> the neighbour did, unless there are reasons to do otherwise.
>
> Keyboards are thus preferably made by paper, pencil, and careful
> thoughts. People to implement them are of course needed - and welcome
> - but the process of deciding what to implement is long, and needs the
> input from both linguists, localisers, and experienced typists.
>
> ----------------------------------------------------------------------
> Trond Trosterud t +47 7764 4763
> Institutt for språkvitskap, Det humanistiske fakultet m +47 950 70140
> N-9037 Universitetet i Tromsø, Noreg f +47 7764 5216
> Trond.Trosterud (a) hum.uit.no http://www.hum.uit.no/a/trond/
> dn------------------------------------------------------------------đŋ
>
>
>
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! Try it!