> Dear all, good morning and complements of the season.
>
> At the user end, not computer , but internet display of someYoruba
> fundamentally important letters especially , S s with underdot, E e; O o
> with tonal signs plus underdots remain globally problematic until now in
> emails,Yahoo, Google ,other e-groups, etc.
> There is something wrong somewhere. Some workable solution can and must be
> found soonest possible .At your earliest convenience, please share your
> own updated insight on related issues raised below by Dr Osborn especially
> item 4) in the interest of Yoruba language usage on the internet.
From Dr. Samuel Kayode Olamijulo
From: Don Osborn <dzo@...>
To: Dr. Samuel Olamijulo <samola43@...>; Andrew Cunningham <andrewc@...>
Cc: Dr. Tunde Adegbola <taintransit@...>; KONYIN oyegbla <oyegbola@...>; Prof. Kunle Kehinde <lokehinde@...>; Prof. Yiwola Awoyale <awoyale@...>; kole odutola <odutola@...>; Prof. Antonia Yetunde Schleicher <ayschlei@...>; Dr. Kayode Fakinlede <jfakinlede@...>; Andrew Cunningham <andrewc@...>; Soji YorubaForKids <info@...>; Adedoyin S. Adenuga <adenuga@...>; Folabomi Olamijulo-Oki <bomiola@...>; Dr. A.O. Onayemi <info@...>; Dr. Babatola Aloba <babaloba@...>; fomidire@...; Toyin Falola <toyin.falola@...>; molarawood@...; eadagun@...; oosasona@...; mokome@...; osoriyan@...; k.lawal1@...; profadelugba@...; jkolupona@...; bamgbose@...; alukome@...; sany@...; roposek@...; dotogundeji@...; ilesanmi@...; walter@...; africanoracle@...; tundeojo@...; sadef@...; Arabinrin Molara Wood <molarawood@...>; Olootu Jare Ajayi <afonrereyoruba@...>; Martin 'matto' Akindana <matto1@...>
Sent: Saturday, December 22, 2007 11:00:43 AM
Subject: Tech support for Yoruba orthography (RE: [A12n-Collab] Re: 5 categories of African orthographies ...)
Greetings of the Season to you too Samuel, and Andrew (hope it's okay to be less formal all around), and to all who are reading this.
Permit me to put this conversation in context for the others. There is now a discussion on the "A12n-collaboration"* list concerning support for Latin (Roman) orthographies of African languages. I've proposed a system of classification of Latin-based orthographies in Africa based on support in Unicode (ISO 10646) for the characters included in them. (Category 1 is ASCII, like English, Swahili, Zulu; Category 2 is "Latin-1" or ISO 8859-1, like West European languages, Sango; Category 3 includes special characters and sometimes also diacritic characters supported in category 2, like Hausa, Wolof; Category 4 is distinguished by the use of "combining diacritics" usually for tones, like in Yoruba, Dinka; Category 5 are cases - very few by now - where a character in the orthography is not at all provided for in Unicode). The purpose of this system is simply to provide a common terminology. For those familiar with Conrad Taylor's work on "Typesetting African Languages" (published in 2000, but already dated), this schema is similar in some respects but different in its reference to Unicode support.
(All the above addresses only issues with Latin-based orthographies. Non-Latin scripts are important also, but have separate issues.)
The category 4 orthographies - for languages like Yoruba and Dinka (or in Asia, Vietnamese) - involve combinations of diacritics that are provided for in Unicode by dynamic composition. This means that, for example, the subdot o with a tone mark is represented by two characters rather than a single one, though ideally this looks to the user like the single character. Input can be facilitated by keyboard systems designed for the languages.
The rub is that the implementation of this system of dynamic composition is still uneven, and on older computer systems problematic, as Andrew points out. Standards for input have not been established either. Most experts agree that the system works, but that more effort by companies and developers is needed to fully support it. Andrew's extensive comments are an excellent coverage of the state of the art. On the other hand, some other experts discuss adding precomposed characters to Unicode to better handle diacritics, although this is problematic on several accounts.
Part of the advantage of discussions like the one on A12n-collab and elsewhere, is to clarify the issues involved currently with support of "category 4 orthographies." The question of precomposed characters (such as in Yoruba a single character for the subdot-o with acute accent and so on) has been hanging on the sideline for some years and I'm hoping it can be disposed of appropriately. I'm not saying that we need to encode precomposed characters - it may seem more convenient but ultimately may be inconsequential given the way the technology is developing. What I am suggesting, is that after hearing the issue brought up from time to time by a few very good people, it is worth getting it out there again, to air any issues and resolve them as best we can, so that greater attention can be given to developing the needed support. Thanks to Samuel and Andrew for helping to do that.
This issue is not new, which is all the more reason to get past it one way or another so to facilitate people moving ahead with localization in Yoruba.
The New Year 2008 will be the International Year of Languages. Hopefully it can also be a year of positive action for African languages and information and communication technologies. Again, best wishes for the Holidays and for a Happy New Year.
Don Osborn
*A12n-collaboration was originally set up in March 2002 as an informal online working group on technical aspects of using African languages (text) on computers and the internet. Since 2004 it has been mirrored on Linguist List. For more information, see: http://lists.kabissa.org/mailman/listinfo/a12n-collaboration . Issues of Unicode support and input for African languages have been extensively discussed on this list. Some online resources on Yoruba and ICT include:
PanAfriL10n.org page http://www.panafril10n.org/wikidoc/pmwiki.php/PanAfrLoc/Yoruba , and
"Yoruba language and ICT" http://www.quicktopic.com/15/H/KKgbRqJUAR8
From: Dr. Samuel Olamijulo [mailto:samola43@...]
Sent: Saturday, December 22, 2007 7:21 AM
To: Andrew Cunningham
Cc: Dr. Tunde Adegbola; KONYIN oyegbla; Prof. Kunle Kehinde; Prof. Yiwola Awoyale; kole odutola; Prof. Antonia Yetunde Schleicher; Dr. Kayode Fakinlede; Andrew Cunningham; Soji YorubaForKids; Adedoyin S. Adenuga; Folabomi Olamijulo-Oki; Dr. A.O. Onayemi; Dr. Babatola Aloba; fomidire@...; Toyin Falola; molarawood@...; eadagun@...; oosasona@...; mokome@...; osoriyan@...; k.lawal1@...; profadelugba@...; jkolupona@...; bamgbose@...; alukome@...; sany@...; roposek@...; dotogundeji@...; ilesanmi@...;
walter@...; africanoracle@...; tundeojo@...; sadef@...; Don Osborn; Arabinrin Molara Wood; Olootu Jare Ajayi; Martin 'matto' Akindana
Subject: Re: [A12n-Collab] Re: 5 categories of African orthographies (Latin)
Thank you very much Mr Andrew Cunningham for, as usual, very helpful comments.
All participants must continue to keep focussed on effective user friendly systems that stand a chance for easy widespread usage among us ordinary folks. We cannot wait to see the day when text and other exchanges in Yoruba on the internet will become seemless practice among over 100 million people with Yoruba ancestry located in different areas of today's world.
Once again, we thank you very much.
From Dr. Samuel Kayode Olamijulo
*******************************************
----- Original Message ----
From: Andrew Cunningham <andrewc@...>
To: Dr. Samuel Olamijulo <samola43@...>
Cc: Dr. Tunde Adegbola <taintransit@...>; KONYIN oyegbla <oyegbola@...>; Prof. Kunle Kehinde <lokehinde@...>; Prof. Yiwola Awoyale <awoyale@...>; kole odutola <odutola@...>; Prof. Antonia Yetunde Schleicher <ayschlei@...>; Dr. Kayode Fakinlede <jfakinlede@...>; Andrew Cunningham <andrewc@...>; Soji YorubaForKids <info@...>; Adedoyin S. Adenuga <adenuga@...>; Folabomi Olamijulo-Oki <bomiola@...>; Dr. A.O. Onayemi <info@...>; Dr. Babatola Aloba <babaloba@...>; fomidire@...; Toyin Falola <toyin.falola@...>; molarawood@...; eadagun@...; oosasona@...;
mokome@...; osoriyan@...; k.lawal1@...; profadelugba@...; jkolupona@...; bamgbose@...; alukome@...; sany@...; roposek@...; dotogundeji@...; ilesanmi@...; walter@...; africanoracle@...; tundeojo@...; sadef@...; Don Osborn <dzo@...>
Sent: Saturday, December 22, 2007 6:20:39 AM
Subject: Re: [A12n-Collab] Re: 5 categories of African orthographies (Latin)
Two issues here.
1) Web developers need to get their act together and develop web sites optimised for Yoruba. Very few if any actually are. Web applications and web services need to be optimised. Part of this is buried in the application code, but a lot of african langauge enablement can be handled by customising a web applications skins, themes or templates.
For instance a Dinka language blog running on WordPress MU, would
need to alter the core CSS files so that the the WordPress template uses an appropriate font to display Dinka text. A virtual keyboard system could be implemented via the template as well. Change the value of the primary text processing language in the template.
Both these steps increase WordPress' support for Dinka, and are fairly easy to do. The same would hold true for Yoruba.
Obviously, more complicated steps are possible: translating th text strings in the template into Dinka, adding a wrapper around string functions, so that input is normalised before processing (although this may be better achieved as a plugin or extension)
Similar processes are possible with other web applications.
2) End user issues: Its possible to setup a computer so a web browser can correctly display Yoruba. Fairly straight forward, but pointless unless 1) is covered properly.
When you configure a computer to support Dinka, Yoruba or any
other language requiring combining diacritic support:, the actual steps required will depend on the particular operating system being used, and in the case of the MacOS which applications are being used.
For Windows platforms you need an OpenType font that supports combining diacritics, and you need a web browser (IE, Firefox or Opera) that is using an appropriate version of Uniscribe. Each of these web browsers will use a local copy of Uniscribe if present, or the system version.
Uniscribe is bound by a EULA, and cannot be redistributed. In effect this means that you have access to the version shipped by your operating system or updates installed by various updates or application installs.
In practical terms this means that it is best to use Microsoft Windows XP (Service Pack 2) or Windows Vista. Both these versions of Windows ship with a version of Uniscribe that supports the use of combining diacritics with teh Latin
script.
For Windows XP SP2, you will need to install complex script support in order to successfuly use combining diacritics: http://www.mylanguage.gov.au/lac/135.pdf
You will also need to download a OpenType font to install. Appropriate fonts include Code2000, Doulos SIL, Charis SIL, Gentium Book (Beta), African Sans, African Serif, and DejaVu Sans maybe worth a try as well. Personally i'd install all these.
On the other hand Windows Vista will support Yoruba straight away. No configuration is required. One warning, Yoruba is listed as an input language, but no Yoruba keyboard layout is available. The core fonts on Windows have been updated and you should find soem suitable for Yoruba and other African languages.
On Vista, I'd also install the fonts I listed above.
When creating a website, i'd avoid using the updated core fonts in the
website's stylesheets. Since this will just create all sorts of problems for users on older versions of Windows.
No lets turn to the older versions of Windows. not much luck here form the Microsoft side. There are ways of updating Uniscribe, but most methods would violate Microsoft's EULA.
The simplest approach is to use an alternative rendering technology.
The best approach is to use an older version of Firefox (1.5.0.x) that has been updated to use the Graphite font rendering technology. The you install a Graphite font: Doulos SIL and Charis SIL. And you are ready to go. This will work on Windows 2000 and Windows XP.
For Windows 98 and older .... the Unicode support is so limited and partial, its best not to use these for working with yoruba.
For Linux:
I use Ubuntu with the Gnome desktop environment. Packages are available that integrate Graphite with Pango, allowing applications like Firefox to render text
using Graphite fonts. This works very well on the latest versions of Ubuntu. You may also wnat to test the DejaVU OpenType font on the Ubuntu platform.
For the MacOS.
More complicated. Depends on wether the application is using AAT fonts or OpenType fonts and whetehr applications using OpenTpe fonts support combining diacritics.
I need to do more experimenting with the MacOS.
When I have time I'll provide more detailed information.
But as I said, it doesn't matter if you prep your operating system and install the right fonts. A poorly designed and implemented website will prevent you from displaying text correctly.
For instance Windows Live Hotmail, GMail and Yahoo Mail are now all UTF-8 based, if you are using teh latest Yahoo and Hotmail UIs.
But the fonts these sites use are not suitable.
Internally, we use Firefox, with the Stylish extension installed. We then write site and doamin specific
rules in CSS wich selectively override the fonts used in various parts of the webmail UI. uuaslly we're using a mix of CSS 2.1 and CSS 3 selectors in these rules.
A possible rule for Yahoo Mail could be:
@namespace url http://www.w3.org/1999/xhtml
@-moz-document domain("mail.yahoo.com") {
td[colname=Subject],
td[colname=From],
div.headerSender,
div.headerSubjectLine,
body:not([class~="appBody"]),
.inputField {font-family: Charis SIL !important;}
}
This would force certain parts of teh web application to use the font Charis SIL.
we have other sets of rules for Hotmail and GMail.
Soemthing similar could be done with Yahoo Groups or Google Groups, etc.
This is a cludge. It would be better to develo sites
optimised for the appropriate language rather than forcing an override of the websites CSS.
Unfortunately the end user has no control over what the web deevloper does, so cludges like this are sometimes necessay.
Two additional comments:
It is best to use NFC on the web. So web developers should use nrmalisation routines in their scripts. Quite easy to do.
And you need to knwo your tools. What kind of data is your keyboard layout producing. Is it NFC? Is it NFD? or is it likeley to be completely unnormalised?
On Sat, December 22, 2007 6:21 pm, Dr. Samuel Olamijulo wrote:
> Dear all, good morning and complements of the season.
>
> At the user end, not computer , but internet display of someYoruba
> fundamentally important letters especially , S s with underdot, E e; O o
> with tonal signs plus underdots remain globally problematic until now in
> emails,Yahoo, Google ,other e-groups,
etc.
> There is something wrong somewhere. Some workable solution can and must be
> found soonest possible .At your earliest convenience, please share your
> own updated insight on related issues raised below by Dr Osborn especially
> item 4) in the interest of Yoruba language usage on the internet.
>
Andrew
--
Andrew Cunningham
Research and Development Coordinator
Vicnet
State Library of Victoria
Australia
andrewc@...
Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.