Saturday, December 31, 2016

Revisiting an African language content strategy

While in Mali in late 1999 and early 2000, inspired in part by early research by FUNREDES on Languages and Cultures of the Internet, I began thinking about strategies for increasing African language web content. While recognizing of course that such content would come primarily from communities of speakers of African languages, as well as internationally funded projects which at the time were beginning to think about how to use the internet for development, the motivation was to facilitate creation of an environment favorable to its creation and use.

ISOC's report, 8/2016
Recent discussion of one project and reading about another, each of which deal in different ways with content and communication, and reading the Internet Society's (ISOC) August 2016 report on "Promoting Content in Africa," prompt me to revisit this early effort and look at how things are playing out.

Looking forward from 1999

The basic idea in 1999 was to disaggregate approaches to internet content development in African languages, and consider how each could optimally contribute to the overall goal of greater presence of those languages in cyberspace. In 2003 I reworked that schema to share more widely - for example on the short-lived Africa Web Content Owner email list.1 Elements of this strategy were incorporated in different ways in later work, such as African Languages in a Digital Age (ALDA).

The main approaches were:
  1. Composition of text-based content (including where possible, digitization of works previously published in African languages)
  2. Translation of text-based content from other languages (leveraging the then emerging machine translation [MT] technology)
  3. Development of content in non-text formats (with specific reference to audio2)
It was recognized that production of text-based content from scratch, by composing material (#1 above), is an incremental and artisanal process. In other words, it takes time and effort to achieve modest results.This is especially the case for languages with younger written traditions that are not well supported in education or even sometimes fully supported in software (fonts were then a problem for certain writing systems, for example, and keyboards for them still are). No matter how fundamental text content was, it would be hard to keep pace with content creation in many non-African languages, leaving speakers of those languages with few opportunities to see their language online.

Therefore the possibility of taking existing texts in African languages - from published books or other printed materials - and putting them on the web was suggested as a way to give a small but significant boost to efforts to generate African language internet content. Such texts often have historical or cultural value, and may already be in a standard orthography (or transcriptions that could be easily converted into them.3 A sustained effort to "weblish" these materials, according to this thinking, could quickly add quality material to what is available on the web in a number of languages, and more importantly, make many materials that are accessible only in university libraries more readily available to speakers of those languages. However, copyright protections limit the potential of this tactic (although there have been some sites that appear to have made some of these materials available online without permission).

Therefore, an emphasis was put on alternative ways to create new content, especially translation (#2 above) of various relevant, useful, and interesting material aided by MT, and content built around the spoken voice (#3), responding to oral dimensions of African cultures, as well as the low literacy rates in African languages.

MT in that era especially was mainly a hope for the future, and aside from a few experiments most advances in the 2000s were for pairs of major (mainly Europhone) languages. Nowadays the technology has improved, but the statistical methods that have been key in that evolution require language resources that do not exist for many languages (at least yet). As such, the contribution of MT to content development in African languages is still in the future. 

As far as audio content, this did not emerge as a significant component on the internet (unless one counts songs or the sending of audio files as email attachment, both of which were transferred over the internet rather than presented as part of web content). But see the discussion regarding video sharing below.

Just how much African language content?

Through the 2000s, African language content seemed to grow only marginally and unevenly. A pair of studies published by Rifal in 2003 provided perspectives on this subject that are still useful:
I'm not aware of any more recent studies along these lines, but it may still be the case that there is a significant number of sites with at least some African language content, but that these are still mostly descriptions.

New kinds of content

The rise of social media, video sharing, and mobile devices over the last decade or so has changed how we think of and produce web content, opening new possibilities for African languages in cyberspace.

Social media, including blogs and wikis, makes the creation of content in any language easier. But in the case of many African languages, also brings us face-to-face with other limitations in input systems (for extended Latin and non-Latin scripts), education (where schools use only Europhone languages so that people aren't familiar with writing their first languages), and incentive (where the audience for text-based content in less widely spoken African languages is perceived to be small).

Video in a way fulfills the old idea of audio content on the web, but with the obvious advantage of visual (though there are at least a few YouTube videos with static presentation - a picture or line of text - and full audio in one or another African language). What's missing as far as I can tell is a way to find videos in specific languages that does not rely on the producer having tagged it appropriately (which may not happen).

Mobile devices have changed how we access and interact with content, and consequently how content is designed and even conceived. They also have become the most common way for Africans in general to access the internet - proportionately more important I believe than in any other continent. What I don't have a sense of is how much content in African languages is developed with mobile devices in mind. On the other hand the input limitations for some writing systems would certainly be an issue for use of some African languages in messaging for example.

"Promoting Content in Africa," 2016

ISOC's recent report includes a look at structures to support development of content in Africa, including in African languages. It is encouraging to note the attention ISOC is giving in this report to the importance of African language content for internet use in Africa.

One of the recommendations ISOC has for promoting local content in Africa, including that in African languages, is to promote development of local infrastructure, including data centers, Content Delivery Networks, and Internet Exchange Points. This idea to in effect create a facilitating environment for creation of African language content is an interesting strategy, and would complement other efforts such as mentioned above.

1. The direct link to the post in the AWCO archives is apparently accessible only to subscribed group members. I've created an alternative presentation of it on my website. That post has more background.
2. An early consideration of audio and web content on AWCO mentions Native American interest in the topic, as well as a project in Mauritania (also available on my website).
3. Numerous transcriptions of histories and tales from before the adoption of current orthographies used systematic notation that generally corresponds directly to characters used today (1:1 or occasionally 2:1). I have encountered this for example in various older materials on Fula and Bambara. Cheick Anta Diop's famous Wolof translations of scientific and European cultural texts into Wolof (1955) similarly used a regular transcription that predated the current standard orthography in Senegal.

Friday, December 30, 2016

A meeting with ACALAN in Bamako

At the end of last month while in Bamako, Mali, I had the chance to meet with Adama Samassekou, who is currently serving in an advisory capacity with the African Academy of Languages (ACALAN), and with the current executive secretariat of ACALAN - Dr. Lang Fafa Dampha, Senior Research and Program Officer, and acting Executive Secretary; Dr. Ojo Babajide Johnson, Senior Program and Project Officer; and Kossi Abassa, Finance and Administrative Officer.

ACALAN's office is now in the Hamdallaye ACI quarter of Bamako - when I previously visited it in May 2008, its office was in Koulouba. The organization is evidently rebuilding, with a search underway for a new executive secretary and other staff to hire. They were also preparing for a two-day meeting of the Technical and Scientific Committee (held 8-9 December), one of the major organs of ACALAN.

Other working structures of ACALAN include Vehicular Cross-Border Language Commissions for 12 African languages (a structure I mentioned on this blog 3 years ago), and the national language agencies in the various African countries.

They also have a number of projects including on terminology and lexicography, the linguistic atlas of Africa, cyberspace, interpretation and translation, collection of stories, and a graduate program in applied linguistics. While ACALAN is headquartered in Bamako, its Pan-African Center of Interpretation and Translation, and the Terminology and Lexicography Project are based in Dar-es-Salaam, Tanzania.

And ACALAN publishes various materials and reports, as well as an academic journal called Kuwala.

The meeting we had last month will hopefully facilitate collaboration in the future. 

It is also worth noting that this year we are about to conclude marks one decade since the ACALAN-sponsored Year of African Languages. That year, 2006, was also the year that ACALAN (founded in 2001) formally became the African Union's specialized language agency.

Wednesday, December 28, 2016

Mabati-Cornell Kiswahili Prizes 2016

The second annual Mabati-Cornell Kiswahili Prizes for African Literature were awarded earlier this month at Cornell University in Ithaca, New York. As discussed previously on this blog, the Mabati-Cornell is the only literary award going to writers publishing in African languages.

Mabati-Cornell, which was founded in late 2014 by Cornell faculty Dr. Mukoma Wa Ngugi and Caine Prize for African Writing director Dr. Lizzie Attree, recognizes literature in the Swahili language. Its sponsorship by by the Kenyan company Mabati Rolling Mills led Mukoma Wa Ngugi to state that "the prize sets an historical precedent for African philanthropy by Africans and shows that African philanthropy can and should be at the centre of African cultural production."

This year's prizes (announced on 14 Dec. 2016) went to:
  • Idrissa Haji Abdalla (Tanzania), for Kilio cha Mwanamke (fiction #1)
  • Hussein Wamaywa (Tanzania), for Moyo Wangu Unaungua (fiction #2)
  • Ahmed Hussein Ahmed (Kenya), for Haile Ngoma ya Wana (poetry)
The 2015 Mabati-Cornell Kiswahili prizes went to:
  • Anna Samwel (Tanzania), for Penzi la Damu (fiction #1)
  • Enock Maregesi (Tanzania), for Kolonia Santita (fiction #2)
  • Mohammed K. Ghassani (Tanzania), for N'na Kwetu (poetry #1)
  • Christopher Bundala  (Tanzania), Kifaurongo (poetry #2)

Friday, December 02, 2016

Quick comments on language in Mali today

In transit on the way back from a quick 3 weeks in Mali as part of a short-term consultancy with the Mali Justice Project. More on the project at another time, hopefully, but here are some quick (and unfortunately superficial) observations regarding language in Mali while they're still fresh in the memory.

This was my first time in Mali since 2008. Bamako has grown considerably, and from observations and descriptions, there is a much larger urban middle class. However, that has not seemed to have been accompanied by a shift to French in everyday language (as one might see in some other cities in Francophone states). Bambara seems to be spoken everywhere, with French and occasional other languages as well.

As an obvious foreigner, efforts to use Bambara are generally met positively or matter-of-factly (to the extent one's accent hasn't obscured the fact one is using the language). That was great, though I admit it almost got a bit disconcerting to have airport security shift out of role to banter with the toubab speaking broken Bambara.

Only one real opportunity to speak Fulfulde, and that with a colleague in the project office. I found though that the whoosh of Fulfulde I was able to call up (to my surprise) was hard to turn off at first. Part of that is shifting between too many languages for a former monolingual (even crossed wires between Bambara and Chinese once - an old occasional lapse I ascribe to the similar structures of the languages and how I think my brain handles them).

In traveling outcountry to Segou and Sikasso, also found Bambara easy to use in various settings. With Segou that is expected, since it is ethnically Bambara (and the center of a major precolonial Bambara kingdom), but even in Sikasso, the major city in the ethnically Senufo/Minianka region of Mali, there was no problem speaking Bambara with anyone small merchants to heads of services. In fact, a 2-day project meeting of people involved in commerce, transportation, and services de contrôle in Sikasso worked mainly in Bambara after starting in French.

Not so much signage in Bambara, though on the road to Segou did notice a couple of signs with N'Ko (wasn't able to get photos, sorry). Orange - the phone company - had a TV ad with "I ni tié" (i ni cɛ ≈ thank you). Frenchified renditions of Bambara text are frequent in the written forms I saw in such short usage, which generally accompanied French.

Also had the chance to visit the ACALAN offices - more about that in another post.

Saturday, November 19, 2016

Some illustrated Senufo proverbs

"Yì fwù ɲara na" (welcome)
On the margins of some other research, I was recently able to pay a visit to the Centre de Recherche pour la Sauvegarde et la Promotion de la Culture Sénoufo (CRSPCS) in Sikasso, Mali. Although the purpose of going there was not primarily language-related, it is worth noting that among the CRSPCS's areas of activity are research and publication on the Senufo language(s) spoken in southern Mali, northern Ivory Coast, and western Burkina Faso.

The center was founded in 2005, the result of an effort begun by Rev. Emilio Escudero Yangüela . It serves educational roles, including in coordination with the regional museum in Sikasso, and has collected cultural objects that are on display. (A short video in French that evidently aired on Malian TV gives a more complete introduction.)

In a tour of the grounds and buildings housing collections - for which we thank Mr. Elie Yaya Bambe - one notices several outside walls that are decorated with proverbs and illustrations. Some of these follow:
Dù fànŋà kà mà sâ, ká mū dí ŋkwōō wólò, kìmàhā mpɔ́rɔ́ mūnā.
If a donkey gives you a kick, and you reply in kind, it is better than you.

Ná mū sí kàcɛ̀nnɛ̀ pyǐ kùnùŋɔ́nā, mùmàhā kǐyǎhǎ ywɔ̌hɔ̌ ɲɔ́ná.
If you want to be good to the tortoise, put it close to the water.

Kùtùnɔ̌ ká ncyɛ̌ jyègěě kǎnɛ̀ŋɛ̀nǐ, ŋkórò kǐyɛ̀.
If the monkey refuses to enter into the dew, it ends up being all alone.

Mūhà bìmâ lē mā khɔ́hɔ́ɲɛ́ɛ́n ɲìī nī, mū màhā ŋkhɔ̀hɔ̀lì mā yɛ̀.
If you put dust in the eyes of your dancing partner, you dance alone.
I did not get a clear answer about which Senufo language these proverbs are written in, but the main Senufo variety in Sikasso is Supyire. (The tone markings as seen in the paintings are reproduced as best as possible as text in the captions; English translations from the French translations. Corrections of course welcome.)

Senufo proverbs, riddles, and tales

The CRSPCS has not published Senufo proverbs, but it has produced small books of riddles and tales in Senufo with parallel French text. One riddle from Devinettes Sénoufo, Vol 1 (Elie Yaya Mpê Bamba and Bernard Delay, eds., Collection "Wu Nire," Harmattan Burkina, 2015):
Ŋuni a tɔɔn, tɛgɛlɛ bàa. (It is so long that it has no end.)
Kudo. (A path.)
Note the more sparing use of tone marks. Judging from the CRSPCS publications I saw, and by an online dictionary of Mamara/Minianka (another Senufo language), usage of tone marks in writing generally may be more sparing than what one sees in the proverbs above.

For more proverbs, there is a collection published by Timothy F. Garrard in 2001 as La sagesse d'un peuple : 2000 proverbes Senoufo (link to description; this work is not yet available online).

Monday, October 10, 2016

"Wogbɛ Jɛkɛ" & Ghanaian language input support

Came across mention on Twitter of the Ghanaian play "Wogbɛ Jɛkɛ - A Tale of Two Men" but with the Ga words in the title written "Wogb3 j3k3":
In fact, looking at Twitter and at the web via a Google search, one notes both this workaround and the correct spelling, as well as the ASCIIfied version, "wogbe jeke."

7 vowels and a 5 vowel keyboard

Ga, a Ga-Dangme language of southernmost Ghana, has a complex vowel system, with seven vowels distinguished in its writing system: a; e; i; o; and u; plus ɛ ("open e") and ɔ ("open o"). The latter two are used to write many other African languages such as Akan, Ewe, Mende, Bambara, and Lingala.1 (These characters, like a number of other Latin letters, are also in the International Phonetic Alphabet.)

Many fonts include the ɛ and ɔ, however typing them is not facilitated by standard keyboards. There are keyboard layouts specially conceived for Ga (see below for a list), as well as for Akan, Ewe, and others. However, there apparently are not any keyboards to enable multilingual input - such as an Akan title included in a tweet in English. Or if there are, they are not widely used. Hence resort to "3" for "ɛ" and ")" (the right parentheses) for "ɔ."

In African Languages in a Digital Age (p. 61) I outlined several workarounds for text including extended Latin characters not supported in fonts or input systems, a summary that was a revision of something published a decade earlier.2 I had not, however, noted the use of numbers or symbols among the "substitution solutions." Ade Sawyerr, who has worked with Ga input issues, mentions observing these particular substitutions - "3" and ")" - as well as others, such as "rj" for the letter "ŋ" ("eng"), which is also used in Ga.

In any event, the resort in the mid-2010s to 3's and )'s to type words in languages like Ga, Akan, and Ewe that use them is evidence of missing input options on the devices used, or inconvenience of existing options, or perhaps lack of awareness of available keyboard apps on the part of users.

Some keyboard layouts for Ga

Over the last couple of decades, and especially since the availability of keyboard utilities like Keyman and Microsoft Keyboard Layout Creator (MSKLC), there have been many keyboard layouts developed for languages such as those of Ghana that have extended Latin orthographies. A full discussion is beyond this blog post, but generally speaking, keyboards incorporating characters not on the standard computer keyboards work either through changing key assignments (such as "q" is not used in Ga, so "ŋ" is substituted for it) or via a combination or sequence of key strokes. The solution with changed keys seems to be more common on mobile device applications, whereas both approaches are found in keyboard layouts used on computers.

Kasahorow Android keyboards
menu selection
A selection of Ga keyboards:
There likely are others for Ga (and the closely related Dangme). There definitely are a number for other languages of Ghana such as Akan (or its varieties, Twi Ashanti, Twi Akuapem, and Fante), Ewe, and Dagaare.

However, more could be done to facilitate multilingual typing, so that one doesn't have to switch keyboards or keep track of key sequences to insert something like Wogbɛ Jɛkɛ in an English tweet, or say a Hausa word with a hooked letter in a text in Akan (hooked letters are not part of the Akan orthography). Could for example an extra line of keys be added to touchscreen keyboards - say on a Ghana English keyboard - with the extra characters needed for Ghanaian languages?

About "Wogbɛ Jɛkɛ"

Wogbɛ jɛkɛ is a Ga term with meanings of "we have come from far" and "our journey is still long." It is used in the title of two plays written by Chief Abdul Moomen Muslim about the historical events, beginning with "Wogbɛ Jɛkɛ: Birth of a Nation," which depicts pre-colonial history of what is now Ghana, and followed by "Wogbɛ Jɛkɛ: The Tale of Two Men," which is centered around the stories of J.B. Danquah and Kwame Nkrumah during Ghana's independence struggle.

1. Some Nigerian languages like Yoruba and Igbo instead use sub-dotted characters - and - for these vowels.
2. Don Osborn, 2001, "The knotty problem of using African languages for e-mail and internet," Balancing Act News Update, 69.

Friday, September 30, 2016

Internationalizing computer science in Africa

Last year I posted on whether Unicode and internationalization (i18n) is included in any computer science curriculum in Africa. A recent comment to that post by Andre Schappo asking whether there are any organizations in Africa promoting internationalization of university curricula more generally offers another angle to approach this issue.

Part of Unicode charts for Ethiopic/Ge'ez
Andre's question follows a post on his blog about two organizations that promote internationalization of teaching curricula, one in the UK and the other in Australia. Depending on how one defines promotion of internationalization in higher education, one might add many other initiatives and consortia which seek in one way or another to develop and support international or global studies. The degree to which such efforts overlap with or might impact the content of computer science courses is an interesting question. In my limited experience, international/global studies mainly addresses disciplines in other areas (social sciences, humanities, certain applied disciplines). It certainly is worth asking how a program of internationalization at a university would apply to computer science and see how the discussion goes.

However, in the case of Africa - and also Asia - internationalization of the computer science curriculum would seem to follow as much from attention to localization as to international and global perspectives.

In any event, this issue of how Unicode and i18n figure in computer science instruction - worldwide as well as in Africa - is one that is important for technical and language planning reasons as well as for the same reasons that motivate attention to internationalization in the higher education generally.