Wikimedia blog

News from inside the Wikimedia Foundation.org

Internationalization and localization

Addressing the many

When you have a message, you use the appropriate language and tools to address multiple people. We do not use our eyes to see how many people we address and we do not use a bull horn to be heard. Our MediaWiki software knows the numbers involved and a plural enabled message will be formed according to the rules of the language.

When we implemented plural support for JavaScript, we checked our new implementation for plural with our implementation in PHP and we checked against the standard for such things, the CLDR.

The Localisation team does not know the language rules for the 280+ languages that have a Wikipedia. We prefer to implement what the standard tells us but we support more languages than the CLDR. We want to channel our need for support through “Language Support teams” and we want them to help us understand  and fix the inconsistencies and add the missing information to the CLDR.

Inconsistencies with the CLDR
  • Belarusian – ‘other’ form missing in MediaWiki
  • Belarusian-tarask – ‘other’ form missing in MediaWiki
  • Bosnian – ‘other’ form missing in MediaWiki
  • Manx - CLDR has 3 , MediaWiki has 4 forms
  • Hebrew – CLDR has 2, MediaWiki has 3 forms
  • Croatian – ‘other’ form missing in MediaWiki
  • Ripoarian / Colonian – order of forms different. CLDR says 0,1, other. MediaWiki says 1,other,zero
  • Latvian – CLDR defines zero, one , other forms. MediaWiki has only two forms, one for (1, 21, 31, 41, 51, 61…) and another for rest of the forms.
  • Macedonian – CLDR defines forms[0] for n!=11. MediaWiki defines forms[0] for n%100!=11
  • Polish: ‘other’ form is not defined in MediaWiki.
  • Russian : CLDR defines 4 plural forms. Form with decimals missing.
  • Slovenian – MediaWiki defines a zero form which is not present in CLDR
missing in CLDR
  • Church Slavonic
  • Lower Sorbian
  • Scottisch Gaelic
  • Upper Sorbian

Please make a difference for the support for your language and join the Language support team.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

End of sprint 6; Translate and other goodies

Every two weeks a sprint and every week a deployment. The Localisation team aims to bring you new and updated functionality when we have it.

As you can see in the summary below, the focus this sprint has been very much on the Translate extension. Management of translations and the translation process is what we have worked on. When texts are translated in a Wiki, they often are only needed within a specific time frame; it is now possible to mark a text as no longer needing any effort. For many languages there are multiple people involved in the work flow for the creation of a document that is well written in translation. When they are to work well together, it helps when their work changes its state so that it is clear that for instance something has been proofread.

The person who manages the publication and distribution of a page needs work flow states to decide what more needs to be done and what is ready. To do this he can make use of states that already exist or define additional states. These states are available as local messages and are available for translation.

Translate extension features

  • Message work flow states help translator translate, review and making ready for publication
  • There is now a new message group for recent translations. This message group makes these states possible in translation
  • Special:MyLanguage can now be used with language sub pages to be used as the default fall-back instead of providing an untranslated version
  • Pages marked for translation can now be marked as “discouraged”. They will no longer show up in the usual places. This prevents translators from translating them needlessly.
  • Added {{#translationdialog:title}} for creating a link to the translation dialogue

Translate bug fixes

  • The flash of unstylized content effect is reduced
  • Made the extension work without legacy JavaScript globals
  • The summary row in Special:LanguageStats and Special:MessageGroupStats is no longer sorted with rest of the rows.
  • Fixes to the sizing of the translation editor dialogue
  • Fixed a fatal error that sometimes occured when translation page title used GRAMMAR and the page was viewed with English UI.

Miscelaneous changes

  • Parserfunctions ifexist magic word Italian translation fixed to ‘ifexist’
  • Narayam preference wording changes from disable to enable
  • The WebFonts icon no longer overlaps with the menu text
  • WebFonts preview allows you to preview a text with a font. You can download these freely licensed fonts to your system.
  • GENDER and PLURAL support are now available for use in JavaScript.
  • Consistence updates for grouppage-* messages, for LocalisationUpdate
  • Fixing be-tarask grammar forms

Changes deployed last week

  • WebFonts was deployed for the Bishnupria Manipuri language; it uses the Lohit Bengali font
  • Support for gendered name spaces was deployed for the Russian wikis.

As always, you are welcome to have a look at our sprint backlog (user:guest password: guest) and bug us in bugzilla with whatever needs fixing.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

The localisation team sprints into the new year..

WebFonts is the first extension that gets user documentation served from MediaWiki.org. At the time of writing, the documentation has been written, it does serve people with help text about WebFonts and it is ready for translation. People looking for help will be served help in the language of their user interface if there is a translation.

WebFonts drop down on or.wikipedia.org

In a way it seems like a minor thing but consider;

  • MediaWiki can serve help texts for its functionality
  • this help text may differ based on the language of the user
  • the help text can be translated
  • a new community for MediaWiki help text translation is needed
  • functionality like Narayam will surely get its user documentation in the near future

It will be a challenge to other developers and developer teams to adopt and refine the way assistance to our users is provided. We learned at translatewiki.net that documentation did improve the quality of the localisations. We hope that user documentation will reduce confusion and makes for happy editors and readers.

The WebFonts user documentation was deployed last Tuesday. This and some other changes can be found in the deploy list. As the holiday season is in full swing, sprint 6 has started; it will run into the new year.

In this sprint stories will be developed that will make “Translation review” feature complete. When this is implemented, it will help translators and localisers review each others work and assign a status to their work for further considerations. As you can imagine, the different statuses themselves will become available for translation; card 326 defines this and will make this possible. This is just one of many stories that make up this feature.

For the localisers of the MediaWiki software a long held ambition will be realised; card 206 will see “plural” support implemented for JavaScript. When this functionality is deployed, it will result in a long list of future changes that will see changes to the actual messages.

The new year will bring us many new challenges and opportunities to the many many language communities. The Wikimedia Localisation team will work hard to provide you with the tools to be efficient in any language to get our message out and provide information in any language. For some of us the new year starts at a different moment so it will be very much business as usual; we welcome you to have a look at our sprint backlog (user:guest password: guest) and bug us in bugzilla with whatever needs fixing.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

 

Localisation team sprint 5 update II

Probably the most interesting highlight of today’s i18n deployment is the configuration of the Translate extension on MediaWiki.org. We have observed that on some wikis special pages exist that explain in the language of the Wiki functionality like Narayam or WebFonts. Such documentation is welcome on all MediaWiki installations where the functionality is used by people using the same language for their user interface.

For writing the documentation MediaWiki.org is the obvious platform. With the deployment of Translate we have the basis for writing and translating user documentation in a structured and organised way.

Narayam and WebFonts have been updated to the latest versions that have been tested on translatewiki.net. As Narayam and WebFonts are still very much a work in progress, we invite anyone to continue their testing at translatewiki.net . The changes are:

  • menu appears only on click, not when hovering
  • menu positions are now correct for RTL languages and do not go off screen any more
  • Narayam and Webfonts support the Kannada script for the Tulu language on the Incubator

There are also some smaller fixes among them the change of the autonym for the Veps language to “Vepsän kel”.. The full details for all the changes is at revision 106667.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

 

Localisation team sprint 5 update

With a new sprint, new functionality for MediaWiki is identified to be deployed in two weeks time. There is room for dealing with issues to do with Narayam and WebFonts. Many of the new activities have to do with documentation, translation and feedback.

The sprint backlog in Mingle (user: guest password: guest)

What we hope for is that the feedback functionality that is now part of MediaWiki can be used to ask for feedback of MediaWiki features. It is obvious that the Wikimedia Localisation team cannot support all the 300+ languages that have their projects or exist in the incubator. What we can do is process the information we get from our language support teams. Figuring out how to do this is one of the goals for this sprint.

The use of Narayam and WebFonts will be helped a lot with documentation; “where to find that character on this keyboard mapping” or “what does an international keyboard look like” are questions looking for an answer. Determining how to document and what to translate is not all that obvious. With keyboard maps and fonts distributed as part of MediaWiki documenting on “the” wiki does not scale to other Wikimedia wikis and, MediaWiki wikis outside the Wikimedia Foundation are as much in need of documentation. When people start using MediaWiki because of such language support features we accomplish real support for a language.

For this sprint, these questions are looking for an answer and in the mean time the Translate extension will gain these new features:

  • Documents that need translation can be grouped together; for instance all the Fundraiser messages or Wikimedia reports
  • Documents can be marked as no longer needing translation
  • Changes to the state of documents and translations will be logged and the log will be available for viewing
  • Depending on the state of a document or a translation, attention can be drawn when there is a need for activity

User documentation needs translation and hopefully many of the algorithms used for the localisation of MediaWiki at translatewiki.net will equally apply for user documentation. Life will become a lot easier for all those people who administer MediaWiki and have only a basic understanding of English. We hope to deliver this in one of our future sprints.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

Localisation team updates going live, December 12 2011

Every Monday, the #Wikimedia Localisation team has a window of opportunity to roll out new and improved functionality. This release is at the end of an Agile sprint and it reflects the stories that our developers committed to develop at the start of the sprint. Multiple stories means that what is delivered can and does cover different functionalities; today is not different;

  • It features the launch of WebFonts for selected Indic languages and projects
    • All Assamese, Bengali, Gujarati, Hindi, Kannada, Marathi, Nepali, Oriya, Punjabi, Sanskrit, and Telegu wikis
    • The Malayalam and Tamil wikis will not be supported by WebFonts for now
  • Narayam  has several new keyboard methods more mappings, improved UI, support for modern and monobook skins
  • Bug 31330 changes the preference to Babel extension information
    • this improves the coexistence of Babel information in templates and the extension
  • Cropping of text issues in the headers of many Indic languages finds a solution

When you frequent translatewiki.net, you will have seen it all. When you follow the Bugzilla bugs for Internationalization, you may have commented on the issues that are finding a resolution. For most Wikimedians the existence of all this hardly registers; it does not affect their language, their community.  When it does affect their language, their community it is very much a road towards editing in their language as easily as it is to edit in English.

We are eager to learn about any issues on our IRC  channel. Bugs are best reported at Bugzilla.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

Ready for the WebFonts launch

After months of preparation, demonstrating the latest versions in person and on-line, going through tons of feedback and implement resulting modifications, we are ready for the launch of Webfonts. Web fonts is a technology that ensures us that the readers of our wikis will always see the intended characters on their screen. Many devices do not provide the necessary fonts that allow people to read their mother tongue.

When people do not even see what we aim to provide to them, we fail. According to the Wikipedia article, web fonts are considered “controversial” because the licenses of many fonts prevent them from being used as web fonts. There is no such controversy when freely licensed fonts are used and we are really happy with our collaboration with the producers of such fonts.  We learned that fonts working on one platform do not necessarily work as well on another platform / operating system.

Enabling people to read and enabling people to write their language is at this time our prime objective and, when people are happy when they find they can as they did at the localisation sprint in Pune. Being able to type Marathi or Punjabi, Hindi or Tamil on a thin client put a smile on many faces. They used the latest software at translatewiki.net and  the feedback we got from them and others has resulted in many technical and usability improvements.

The launch of WebFonts together with the Narayam improvements on Monday 12 December represents significant progress in helping enable Indic language contributions to our projects; it consists of a large amount of code, it will be implemented on a selected range of wikis and it affects many communities. It will affect them and the Wikis in Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil and Telegu.  All these communities have been involved it testing the evolving functionality at translatewiki and the comments and bug reports we received were essential for what we are now proud to present. With the launch more people will experience the WebFonts technology for the first time. We are eager to improve on what we have because we believe that the web fonts technology is crucial for the emancipation of many languages and scripts in this digital age..

Thanks,

Gerard Meijssen

Internationalization / Localization outreach consultant

Supporting the languages of India

India is different. Given that India is very strategic for the Wikimedia Foundation, the question is what can we do to raise the profile of our projects and what can we do to support the Indic language effectively.

Many well educated people, people with a university level education are effectively illiterate in their own language. For them a Wikipedia in their own language does not tempt them to get involved. They do not have the skills even though it would not be that hard for them to learn to read and write their mother tongue. What really helps is that writing the Indic languages is helped in two ways; the scripts are really phonetic and InScript, the dominant keyboard layout for Indic languages, ensures that the same sound is always in the same place.

When our goal is to get more people involved in the Indic languages, we can ask people to transcribe the scans of public domain books. We will be providing them with a keyboard mapping, the fonts that show their language. As these “illiterates” recognise the characters and reproduce them digitally, they learn not only to type their language they may even learn to read. When we recognise their effort in a thank you note accompanying the book, experience teaches us they are likely to help us in future projects.

The project that is already making a big impact in India in this way is the Malayalam Wikisource project.They published a CD with a years worth of sources and distributed it to the schools of Kerala. They produce software that ensures that the content looks really good. The software as well as the content is available on the internet but sadly this full experience can not be had on Wikisource itself.

When a new book becomes available, the Malayalam press mentions this often in their periodicals so much so that Wikisource is mentioned more often in the press than Wikipedia.

 

 

Similar projects for other Indic languages have been a popular topic at the WikiConference India; it was discussed at least for Sanskrit and Tamil. The discussion was not only about the organisation of such a project but also about internationalising the software that prepares the final product and about using Kiwix for presenting it. When you consider how much literature is available in the Indic languages that is already in the public domain, this is a project that will run and run.

Preparing sources in Wikibooks or Wikisource in a collaborative way makes sense in a Wiki. Once the work is done however publishing the content can be in all kinds of formats. This is important because we do want it to be read as widely as possible because this is how we optimally realise our objectives.

Jimmy is right when he said in his speech that the Indic language communities can learn from each other and do really well. However these best practices can be applied to any Wikisource or Wikibooks.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

Hackathon Mumbai has started

 Concurrent with the WikiConference India a hackathon has been organised. At the Mumbai hackathon many Wikimedia developers are present but there are many, many more Indian developers. The one thing that is quite funny is that when you ask them “what language do you speak”, they say that it is English. Only when you ask “do you speak any other languages?” you learn “eh, Hindi, Marathi, Tamil..”

Obviously a hackathon is not only for language support, far from it, but there will be a lot of development on the things that tie in with the functionality developed by the Localisation team for MediaWiki like input methods, web fonts and maybe even transliterations between the scripts used by languages like Konkani or Panjabi.

Hackathons are powerful; they help raise awareness that there is not only an “edit button” but that you can also work on the code and help determine what MediaWiki and consequently Wikipedia may be.

Thanks,

Gerard Meijssen
Internationalization / Localization outreach consultant

 

The Wikimedia Foundations terms of use .. in translation

When you make use of any of the projects of the Wikimedia Foundation, you are expected to abide by its terms of use. These terms of use provide you with practical and legal terms of reference. The original version is in English but we do know that for many in our communities English is not a language that will convey any message.

For this reason the translation of the terms of use is essential. There is a recurring need for the translation of texts and this translation work is done by volunteers. This work is really important to get our message out, making it as easy and efficient as possible is one way of showing our appreciation for the work that these volunteers do.

Translation is made easy because the user interface will just work in the language set in the preferences.

 

Details like the languages that have a translation are all shown in the language set in the preferences.

Even with the best preparation, a text may change over time. As the text is broken into separate fragments that need translation, it is possible not only to indicate what needs to be revisited by a translator, it is also possible to indicate the changed text in pink in the readable text.

 

Volunteers are masters of their own time. They choose how much they want to do in one go. Making the translated text immediately available is one way in which we show appreciation for the work that is done and, at the same time it is an invitation to other volunteers to complete the work that still needs doing.

With all the Translate functionality in place, we expect that it is easier to translate, we hope that more people will be involved and that important texts like the “terms of use” will become available in as many languages as we can find translators for.

Gerard Meijssen
Internationalization / Localization outreach consultant