Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts by Gerard Meijssen

The end of a slushed sprint

Consolidation was the name of the game for the past sprint for Wikimedia’s Localization team. A bug triage, testing, documentation and bug fixes were the activities designed to make our software more stable and more usable. When you read the bug triage report it becomes clear how much the devil is in the details; real native language expertise is needed to understand and assess the issues  we aim to solve. Read the report and you will see how much we rely on our community, on people like Srikanth and Nemo_bis.

Now that we are writing documentation in a central place, like here on the language statistics of the Translate extension, we are now able to provide you with a help text that is specific to the context. For the language statistics it is a help text about “statistics and reporting“. This functionality is ready but will become available in the deployment of January 30. You can help us and yourself by reading and understanding the text. Ask when you have questions and you can translate the text and make the text that much more your own.

Narayam is another extension that has been improved with user documentation. This documentation is completely new and it can effectively replace existing documentation. The existing documentation has the benefit of being written in the local language and we expect that what is written will be similar to the Narayam documentation. The language communities can then decide if they want to point to the local documentation. Like all our software, the Narayam documentation will be available for translation. Having the translation ready may be one of the considerations.

A lot of work is going into the description of the many input methods like the Inscipt layout for Assamese. These descriptions are “must have” help information when you do not know a particular keyboard layout by heart. They also provide a wonderful opportunity to verify if our implementation for a particular keyboard method is correct. This is yet another instance where native speakers can help us a lot.

Testing and coming to grips with the different tools was a major goal for this sprint. PHPunit and Qunit is what is used to test PHP and JavaScript and the tests developed are used in an environment called TestSwarm and Jenkins (respectively for PHP and JavaScript). As our team is so much into language support, we are learning what the limits are for testing for different languages and scripts.

All in all there may have been a slush and we have done a lot of code review, but we also managed to make sure that our functionality has gained stability for this and future releases. Additionally, work was done on grammar support for JavaScript, but the patch for that was stuffed in a bug report because of the slush, as the story was moved to the next sprint. Grammar support is what fills the gap in localization support between JavaScript and PHP and makes it available to any and all other developers.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

 

Sprinting ahead when there is a “slush”

When there is a code freeze or a slush, the potential for what is to be delivered is curtailed. It is official; you will not deliver new code, you will work towards consolidation of the new MediaWiki release.

One of the objectives for this and the next release is that the time between releases will decrease. Even though the Localization team works in two week sprints, it can help with getting the release out of the door. The first thing to do is help even more with code review, the other thing is make sure that its code will be optimised for easy coding, testing and use.

When you check out mingle, (user guest, password guest), you will find that the developers of our team are learning about the various testing tools. They are even updating the developer documentation to make it easier to understand how to set up new automated tests.

When you are testing, it is necessary that code provides information about its execution. This realization means that the code needs to be refactored in order to allow for testing. Documentation is another part of the puzzle that helps stabilise code; you will find a prodigious amount of documentation that is scheduled for this sprint.

All this translates in quite a minimal deployment for the first week. Its highlights are:

Translate:

  • Better error checking and handling in Special:Translate
  • Translatable page id prefix changed from page| to page-
  • Don’t reuse messages from core

WebFonts:

  • Fixed download of Vemana Telugu font
  • Added font for Ahirani (ahr)

Narayam: Some fixes to Assamese transliteration rules

Core: the cropping of text in level 1 headers is fixed for Indic languages

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

Addressing the many

When you have a message, you use the appropriate language and tools to address multiple people. We do not use our eyes to see how many people we address and we do not use a bull horn to be heard. Our MediaWiki software knows the numbers involved and a plural enabled message will be formed according to the rules of the language.

When we implemented plural support for JavaScript, we checked our new implementation for plural with our implementation in PHP and we checked against the standard for such things, the CLDR.

The Localisation team does not know the language rules for the 280+ languages that have a Wikipedia. We prefer to implement what the standard tells us but we support more languages than the CLDR. We want to channel our need for support through “Language Support teams” and we want them to help us understand  and fix the inconsistencies and add the missing information to the CLDR.

Inconsistencies with the CLDR
  • Belarusian – ‘other’ form missing in MediaWiki
  • Belarusian-tarask – ‘other’ form missing in MediaWiki
  • Bosnian – ‘other’ form missing in MediaWiki
  • Manx - CLDR has 3 , MediaWiki has 4 forms
  • Hebrew – CLDR has 2, MediaWiki has 3 forms
  • Croatian – ‘other’ form missing in MediaWiki
  • Ripoarian / Colonian – order of forms different. CLDR says 0,1, other. MediaWiki says 1,other,zero
  • Latvian – CLDR defines zero, one , other forms. MediaWiki has only two forms, one for (1, 21, 31, 41, 51, 61…) and another for rest of the forms.
  • Macedonian – CLDR defines forms[0] for n!=11. MediaWiki defines forms[0] for n%100!=11
  • Polish: ‘other’ form is not defined in MediaWiki.
  • Russian : CLDR defines 4 plural forms. Form with decimals missing.
  • Slovenian – MediaWiki defines a zero form which is not present in CLDR
missing in CLDR
  • Church Slavonic
  • Lower Sorbian
  • Scottisch Gaelic
  • Upper Sorbian

Please make a difference for the support for your language and join the Language support team.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

End of sprint 6; Translate and other goodies

Every two weeks a sprint and every week a deployment. The Localisation team aims to bring you new and updated functionality when we have it.

As you can see in the summary below, the focus this sprint has been very much on the Translate extension. Management of translations and the translation process is what we have worked on. When texts are translated in a Wiki, they often are only needed within a specific time frame; it is now possible to mark a text as no longer needing any effort. For many languages there are multiple people involved in the work flow for the creation of a document that is well written in translation. When they are to work well together, it helps when their work changes its state so that it is clear that for instance something has been proofread.

The person who manages the publication and distribution of a page needs work flow states to decide what more needs to be done and what is ready. To do this he can make use of states that already exist or define additional states. These states are available as local messages and are available for translation.

Translate extension features

  • Message work flow states help translator translate, review and making ready for publication
  • There is now a new message group for recent translations. This message group makes these states possible in translation
  • Special:MyLanguage can now be used with language sub pages to be used as the default fall-back instead of providing an untranslated version
  • Pages marked for translation can now be marked as “discouraged”. They will no longer show up in the usual places. This prevents translators from translating them needlessly.
  • Added {{#translationdialog:title}} for creating a link to the translation dialogue

Translate bug fixes

  • The flash of unstylized content effect is reduced
  • Made the extension work without legacy JavaScript globals
  • The summary row in Special:LanguageStats and Special:MessageGroupStats is no longer sorted with rest of the rows.
  • Fixes to the sizing of the translation editor dialogue
  • Fixed a fatal error that sometimes occured when translation page title used GRAMMAR and the page was viewed with English UI.

Miscelaneous changes

  • Parserfunctions ifexist magic word Italian translation fixed to ‘ifexist’
  • Narayam preference wording changes from disable to enable
  • The WebFonts icon no longer overlaps with the menu text
  • WebFonts preview allows you to preview a text with a font. You can download these freely licensed fonts to your system.
  • GENDER and PLURAL support are now available for use in JavaScript.
  • Consistence updates for grouppage-* messages, for LocalisationUpdate
  • Fixing be-tarask grammar forms

Changes deployed last week

  • WebFonts was deployed for the Bishnupria Manipuri language; it uses the Lohit Bengali font
  • Support for gendered name spaces was deployed for the Russian wikis.

As always, you are welcome to have a look at our sprint backlog (user:guest password: guest) and bug us in bugzilla with whatever needs fixing.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

The localisation team sprints into the new year..

WebFonts is the first extension that gets user documentation served from MediaWiki.org. At the time of writing, the documentation has been written, it does serve people with help text about WebFonts and it is ready for translation. People looking for help will be served help in the language of their user interface if there is a translation.

WebFonts drop down on or.wikipedia.org

In a way it seems like a minor thing but consider;

  • MediaWiki can serve help texts for its functionality
  • this help text may differ based on the language of the user
  • the help text can be translated
  • a new community for MediaWiki help text translation is needed
  • functionality like Narayam will surely get its user documentation in the near future

It will be a challenge to other developers and developer teams to adopt and refine the way assistance to our users is provided. We learned at translatewiki.net that documentation did improve the quality of the localisations. We hope that user documentation will reduce confusion and makes for happy editors and readers.

The WebFonts user documentation was deployed last Tuesday. This and some other changes can be found in the deploy list. As the holiday season is in full swing, sprint 6 has started; it will run into the new year.

In this sprint stories will be developed that will make “Translation review” feature complete. When this is implemented, it will help translators and localisers review each others work and assign a status to their work for further considerations. As you can imagine, the different statuses themselves will become available for translation; card 326 defines this and will make this possible. This is just one of many stories that make up this feature.

For the localisers of the MediaWiki software a long held ambition will be realised; card 206 will see “plural” support implemented for JavaScript. When this functionality is deployed, it will result in a long list of future changes that will see changes to the actual messages.

The new year will bring us many new challenges and opportunities to the many many language communities. The Wikimedia Localisation team will work hard to provide you with the tools to be efficient in any language to get our message out and provide information in any language. For some of us the new year starts at a different moment so it will be very much business as usual; we welcome you to have a look at our sprint backlog (user:guest password: guest) and bug us in bugzilla with whatever needs fixing.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

 

Localisation team sprint 5 update II

Probably the most interesting highlight of today’s i18n deployment is the configuration of the Translate extension on MediaWiki.org. We have observed that on some wikis special pages exist that explain in the language of the Wiki functionality like Narayam or WebFonts. Such documentation is welcome on all MediaWiki installations where the functionality is used by people using the same language for their user interface.

For writing the documentation MediaWiki.org is the obvious platform. With the deployment of Translate we have the basis for writing and translating user documentation in a structured and organised way.

Narayam and WebFonts have been updated to the latest versions that have been tested on translatewiki.net. As Narayam and WebFonts are still very much a work in progress, we invite anyone to continue their testing at translatewiki.net . The changes are:

  • menu appears only on click, not when hovering
  • menu positions are now correct for RTL languages and do not go off screen any more
  • Narayam and Webfonts support the Kannada script for the Tulu language on the Incubator

There are also some smaller fixes among them the change of the autonym for the Veps language to “Vepsän kel”.. The full details for all the changes is at revision 106667.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

 

Localisation team sprint 5 update

With a new sprint, new functionality for MediaWiki is identified to be deployed in two weeks time. There is room for dealing with issues to do with Narayam and WebFonts. Many of the new activities have to do with documentation, translation and feedback.

The sprint backlog in Mingle (user: guest password: guest)

What we hope for is that the feedback functionality that is now part of MediaWiki can be used to ask for feedback of MediaWiki features. It is obvious that the Wikimedia Localisation team cannot support all the 300+ languages that have their projects or exist in the incubator. What we can do is process the information we get from our language support teams. Figuring out how to do this is one of the goals for this sprint.

The use of Narayam and WebFonts will be helped a lot with documentation; “where to find that character on this keyboard mapping” or “what does an international keyboard look like” are questions looking for an answer. Determining how to document and what to translate is not all that obvious. With keyboard maps and fonts distributed as part of MediaWiki documenting on “the” wiki does not scale to other Wikimedia wikis and, MediaWiki wikis outside the Wikimedia Foundation are as much in need of documentation. When people start using MediaWiki because of such language support features we accomplish real support for a language.

For this sprint, these questions are looking for an answer and in the mean time the Translate extension will gain these new features:

  • Documents that need translation can be grouped together; for instance all the Fundraiser messages or Wikimedia reports
  • Documents can be marked as no longer needing translation
  • Changes to the state of documents and translations will be logged and the log will be available for viewing
  • Depending on the state of a document or a translation, attention can be drawn when there is a need for activity

User documentation needs translation and hopefully many of the algorithms used for the localisation of MediaWiki at translatewiki.net will equally apply for user documentation. Life will become a lot easier for all those people who administer MediaWiki and have only a basic understanding of English. We hope to deliver this in one of our future sprints.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

Localisation team updates going live, December 12 2011

Every Monday, the #Wikimedia Localisation team has a window of opportunity to roll out new and improved functionality. This release is at the end of an Agile sprint and it reflects the stories that our developers committed to develop at the start of the sprint. Multiple stories means that what is delivered can and does cover different functionalities; today is not different;

  • It features the launch of WebFonts for selected Indic languages and projects
    • All Assamese, Bengali, Gujarati, Hindi, Kannada, Marathi, Nepali, Oriya, Punjabi, Sanskrit, and Telegu wikis
    • The Malayalam and Tamil wikis will not be supported by WebFonts for now
  • Narayam  has several new keyboard methods more mappings, improved UI, support for modern and monobook skins
  • Bug 31330 changes the preference to Babel extension information
    • this improves the coexistence of Babel information in templates and the extension
  • Cropping of text issues in the headers of many Indic languages finds a solution

When you frequent translatewiki.net, you will have seen it all. When you follow the Bugzilla bugs for Internationalization, you may have commented on the issues that are finding a resolution. For most Wikimedians the existence of all this hardly registers; it does not affect their language, their community.  When it does affect their language, their community it is very much a road towards editing in their language as easily as it is to edit in English.

We are eager to learn about any issues on our IRC  channel. Bugs are best reported at Bugzilla.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

Ready for the WebFonts launch

After months of preparation, demonstrating the latest versions in person and on-line, going through tons of feedback and implement resulting modifications, we are ready for the launch of Webfonts. Web fonts is a technology that ensures us that the readers of our wikis will always see the intended characters on their screen. Many devices do not provide the necessary fonts that allow people to read their mother tongue.

When people do not even see what we aim to provide to them, we fail. According to the Wikipedia article, web fonts are considered “controversial” because the licenses of many fonts prevent them from being used as web fonts. There is no such controversy when freely licensed fonts are used and we are really happy with our collaboration with the producers of such fonts.  We learned that fonts working on one platform do not necessarily work as well on another platform / operating system.

Enabling people to read and enabling people to write their language is at this time our prime objective and, when people are happy when they find they can as they did at the localisation sprint in Pune. Being able to type Marathi or Punjabi, Hindi or Tamil on a thin client put a smile on many faces. They used the latest software at translatewiki.net and  the feedback we got from them and others has resulted in many technical and usability improvements.

The launch of WebFonts together with the Narayam improvements on Monday 12 December represents significant progress in helping enable Indic language contributions to our projects; it consists of a large amount of code, it will be implemented on a selected range of wikis and it affects many communities. It will affect them and the Wikis in Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil and Telegu.  All these communities have been involved it testing the evolving functionality at translatewiki and the comments and bug reports we received were essential for what we are now proud to present. With the launch more people will experience the WebFonts technology for the first time. We are eager to improve on what we have because we believe that the web fonts technology is crucial for the emancipation of many languages and scripts in this digital age..

Thanks,

Gerard Meijssen

Internationalization / Localization outreach consultant

Supporting the languages of India

India is different. Given that India is very strategic for the Wikimedia Foundation, the question is what can we do to raise the profile of our projects and what can we do to support the Indic language effectively.

Many well educated people, people with a university level education are effectively illiterate in their own language. For them a Wikipedia in their own language does not tempt them to get involved. They do not have the skills even though it would not be that hard for them to learn to read and write their mother tongue. What really helps is that writing the Indic languages is helped in two ways; the scripts are really phonetic and InScript, the dominant keyboard layout for Indic languages, ensures that the same sound is always in the same place.

When our goal is to get more people involved in the Indic languages, we can ask people to transcribe the scans of public domain books. We will be providing them with a keyboard mapping, the fonts that show their language. As these “illiterates” recognise the characters and reproduce them digitally, they learn not only to type their language they may even learn to read. When we recognise their effort in a thank you note accompanying the book, experience teaches us they are likely to help us in future projects.

The project that is already making a big impact in India in this way is the Malayalam Wikisource project.They published a CD with a years worth of sources and distributed it to the schools of Kerala. They produce software that ensures that the content looks really good. The software as well as the content is available on the internet but sadly this full experience can not be had on Wikisource itself.

When a new book becomes available, the Malayalam press mentions this often in their periodicals so much so that Wikisource is mentioned more often in the press than Wikipedia.

 

 

Similar projects for other Indic languages have been a popular topic at the WikiConference India; it was discussed at least for Sanskrit and Tamil. The discussion was not only about the organisation of such a project but also about internationalising the software that prepares the final product and about using Kiwix for presenting it. When you consider how much literature is available in the Indic languages that is already in the public domain, this is a project that will run and run.

Preparing sources in Wikibooks or Wikisource in a collaborative way makes sense in a Wiki. Once the work is done however publishing the content can be in all kinds of formats. This is important because we do want it to be read as widely as possible because this is how we optimally realise our objectives.

Jimmy is right when he said in his speech that the Indic language communities can learn from each other and do really well. However these best practices can be applied to any Wikisource or Wikibooks.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant