Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Internationalization and localization

New release of the MediaWiki Language Extension Bundle, and other updates

Highlights from the latest development sprint of the Language Engineering team include the release of a new version of the MediaWiki Language Extension Bundle, and continued progress on Translation User Experience (UX) and the Language Coverage Matrix.

Screenshot for the redesigned proofread view for the Translate extension showing translations in Georgian.

Screenshot of the redesigned proofread view for the Translate extension showing translations in Georgian.

Design and development improvements continued for Translate UX, also known as TUX. A preliminary implementation of the Proofreading feature (per the specifications in the design document) includes features to view the messages adjacently, adding clickable markers for proofreading and switching between proofreading and translation mode. Pau Giner presented these updates at an open session and also invited users to join the ongoing usability tests.

Amir Aharoni announced the release of MediaWiki Language Extension Bundle (MLEB) 2013.02. Besides localization updates in most of the components within MLEB, more features were added to Translate UX. The Universal Language Selector however had to be rolled back to the 2012.12 version to ensure compatibility with MediaWiki 1.20.

The Language coverage matrix document was updated to include more information about web fonts and input methods that are currently available for use in MediaWiki and Wikimedia projects. The document aims to provide an overview of the internationalization and localization support in languages across Wikimedia projects.

As part of the ongoing effort to use a CLDR-based, data-driven approach for internationalization features, plural rules for many languages were analyzed and custom rules were removed for a few languages.

The Language Engineering team will be hosting an IRC office hour session on Wednesday, March 13 2013 on in #wikimedia-office (FreeNode server) at 17:00 UTC. Topics will include discussion, questions, feedback about current projects, open bugs and projects planned for the next sprint.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering

Language engineers improve translation tool and meet with their peers

Quem não arrisca não petisca — a Portuguese proverb

During their latest development sprint, the Wikimedia Language Engineering team conducted extensive review and testing of the Translate extension, and participated and contributed to two major open source events in India: a core developers Language Summit and GNUnify.

add caption here

User experience improvements to the Translate tool will notably make it easier and more pleasant to translate content on Wikimedia sites that use it.

Translate Editor Updates

Progress continued on enhancements to the MediaWiki Translate extension. Further testing on the usability of the translation editor, search feature, and prototype of the advanced editing features were conducted by Pau Giner with five users from four different countries. The prototypes were tested in a great diversity of languages including Nepali, Chinese, Tetum, French, Breton, and Finnish. Based on this feedback, changes to the style and specifications for the prototype were made. Details about the individual tests can be found in the final report for this round of testing.

Community Participation

The Language Engineering team participated in the Open Source Language Summit and GNUnify, both held in Pune, India. The Open Source Language Summit, co-organized by the Wikimedia Foundation and Red Hat, consisted of work-sprints that focused on internationalization (i18n) and localization (l10n) features, font support, input method tools, language search, i18n testing methods and standards. More information about the event is available in the detailed event report.

The team also participated in GNUnify 2013, held at the Symbiosis Institute of Computer Studies and Research, in Pune. Besides presenting about the various projects that the team is currently working on, a translation sprint on translatewiki.net was also organized, as well as a workshop on jQuery.IME and a BoF session to discuss issues related to Wikimedia projects in Indian languages. Details of the accomplishments from the sessions at GNUnify 2013 can be found in the event report.

Other Achievements

Additionally, some changes to MediaWiki core were backported to support the newer version of the Universal Language Selector on MediaWiki versions 1.19 and 1.20. As there is no released maintenance version yet, MediaWiki Language Extension Bundle (MLEB) users are advised to remain on MLEB version 2012.12.

Focus for the next sprint

For the next development sprint, the team will work on more features for the Translate extension, like the proofreading mode and further improving the user experience. In addition to this, focus will be on putting together the language coverage matrix as a reference for the status of language support on MediaWiki, MediaWiki Extensions and Wikimedia projects.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering

Language Engineering team participates in GNUnify 2013

Det vatten du hämtar ur bäcken lär dig känna källan – a Swedish Proverb.

GNUnify is an annual gathering consisting of workshops, talks & seminars, held to help increase the awareness of free and open-source software in India.

GNUnify is an annual gathering consisting of workshops, talks & seminars, held to help increase the awareness of free and open-source software in India.

The Wikimedia Language Engineering team participated in GNUnify 2013 held in Pune, India on February 15–17. The team presented their work, conducted a translation sprint, organized workshops and also participated in discussions with local Wikipedians about using MediaWiki and Wikimedia projects in their languages.

Presentations by the team

Runa Bhattacharjee presented about the changing dynamics in the adoption of localized content and the need for developing tools that facilitate new demands. She introduced the projects that the Language Engineering team is working on. Siebrand Mazeland and Niklas Laxström gave a walkthrough of the MediaWiki Translate extension and the translatewiki.net platform, and showcased the new design and features of the updated translation editor.

Santhosh Thottingal presented how the jQuery libraries of Project Milkshake can be used to prepare multilingual web applications for internationalization; he also presented a tutorial on their use. Amir Aharoni demonstrated the easy use of the input methods provided by the jQuery.IME library and how to contribute using phonetic keymaps. He encouraged use in web applications of the currently more than 140 input methods of the library. Yuvaraj Pandian demonstrated how he ported jQuery.IME for use in Android devices.

Alolita Sharma spoke about technologies and tools that help contributing to Wikipedia in various languages. She highlighted the need for features and tools to support non-English Wikipedias and the solutions that the Language Engineering team is developing that would help eliminate fundamental hindrances that contributors face while trying to create content for Indian languages. She also spoke about the other Wikimedia projects that are open for participation.

Workshops

Amir Aharoni conducted a workshop on the jQuery.IME library, in which he demonstrated the procedure to add a new input method and submit it for inclusion on GitHub. A two-hour translation sprint was conducted in which almost 40 participants translated various projects hosted on translatewiki.net. At the end of the session, more than 1000 completed translations were logged and prizes were distributed for the most significant contributions. Yuvaraj Pandian, Sucheta Ghoshal and Harsh Kothari conducted a workshop on building MediaWiki gadgets. Participants were introduced to the process of creating gadgets using JavaScript and CSS, and making them available for other users.

Language Engineering BOF session

The Language Engineering team also organized a session to discuss technical issues related to Wikimedia projects in Indian languages, which was attended by local Wikipedians. Issues related to following up on internationalization and localization bugs and building local technical user groups were discussed.

To conclude, participation in open source conferences such as GNUnify helps get more open source developers as well as language Wikipedians aware of the latest tools that the Language Engineering team is developing which they can use as well as receive direct feedback from the global communities the team serves.

More information can be found in the detailed report.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering

Report from the Spring 2013 Open Source Language Summit

Fortuna i forti aiuta, e i timidi rifiuta — an Italian proverb

The Wikimedia Foundation and Red Hat jointly organized the Second Open Source Language Summit on February 12th and 13th, 2013. The summit was held at the Red Hat engineering center in Pune, India. Similar to the previous summit, this face-to-face work session was focused on internationalization (i18n) and localization (l10n) features, font support, input method tools, language search, i18n testing methods and standards. The sessions were work sprints, each with special focus on a key area. Participants included core contributors from the Wikimedia Foundation, Red Hat (including Fedora SIG members), KDE, FUEL, Google and C-DAC. Below is a summary of what was accomplished during these two days.

During the summit, teams from different organizations came together to discuss language-related challenges, and worked together on features and tools to address them.

During the summit, teams from different organizations came together to discuss language-related challenges, and worked together on features and tools to address them.

Input Methods

Parag Nemade and Santhosh Thottingal worked on making additional input methods available for the jQuery.IME library. 60 input methods, covering languages like Assamese, Esperanto, Russian, Greek, Hebrew were added bringing the total to 144. Also IMEs from the m17n library missing from the jQuery.IME library were identified.

Translation tools, translatewiki.net & FUEL Sprint

Siebrand Mazeland and Niklas Laxström, together with Ankit Patel, Rajesh Ranjan and Red Hat language maintainers, worked to identify more tools that could be used as Translation aids in a translation system. The FUEL project aims to standardize translations for frequently used terms, translation style and assessment methodology. Until now it has focused mostly on languages of India. The FUEL project can now be translated in translatewiki.net. Pau Giner demonstrated new designs for the translation editor and terminology usage, remotely from Spain.

Language Coverage Matrix

To better evaluate the needs for enabling support for languages, a matrix detailing the requirements and availability of basic and extended features is being drawn up. With 285 languages currently supported in Wikimedia and more than 100 in Fedora, this document will be instrumental in bridging the gaps and porting features across projects and platforms. Key areas of evaluation include input methods, fonts, translation aids like glossaries and spell-checkers, testing and validation methods, etc. A preliminary draft was created during the summit by Alolita Sharma, Runa Bhattacharjee and Amir E. Aharoni.

Fonts, WebFonts

An initiative to document the technical aspects of fonts for scripts for languages spoken in India started during the language summit. For each of the scripts, a reference font will be chosen and each font will be explained in detail to intersect with the Open Type font specification as a standard. It will aim to act as a reference document for any typographer working on Indian language fonts. Initial draft and outline of this document was prepared during the second day of the language summit, mainly by Santhosh Thottingal and Pravin Satpute.

Testing Internationalization Tools

Finding suitable methods for testing internationalized components and contents was the major focus of this sprint, with the Fedora Localization Testing Group (FLTG) and Wikimedia’s Language Engineering team sharing details of their testing methods. The FLTG conducts Test Days prior to Fedora beta releases with a test matrix targeted at specific core components, and Wikimedia uses unit tests for frequent testing of their development features. The FLTG showed its plans to integrate the screenshot comparison method for testing localized interfaces. This method will be useful for Wikimedia too. Extending the method for web-based applications and Wikimedia’s language requirements (e.g. right-to-left) were identified as areas for collaboration.

More news from the Language Summit can be found in the tweets, the session notes and the full report.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering

Inching towards enabling our improvements to the Translation user experience

Lel the tacho pirrow, an’ it’s pars kaired — A Romani proverb

The Wikimedia Language Engineering team just completed its most recent development sprint, to introduce a new iteration of  the Translation Editor within the Translate Extension, and include features to make it more satisfactory as a translation workspace. The primary focus during this sprint has been to make the editor ready for production use. Some members of the team also attended FOSDEM 2013 in Brussels.

Translate interface features & enhancements

Paste Source Text  — Often found in translation editors, this feature allows for the source text to be easily copied over into the translation edit box. It’s now available in Translate and is particularly helpful when large portions of the source messages can be reused in the translation.

Message Documentation Display — Details about the messages within a project for translation can now be seen for all messages in a page, by picking the special “Message Documentation” language in the “Translate to” selector. This advanced option allows translators to view and evaluate the context for the messages that they are translating and also to see all the messages that were not documented yet.

The Message Documentation window of the MediaWiki Translate extensionprovides context for individual messages being translated.

The Message Documentation window of the MediaWiki Translate extension provides context for individual messages being translated.

Translation Editor UI  — The other enhancements that help translators to quickly review messages include:

  1. Unchanged translations marked as “outdated” can be marked as suitable for use using the Confirm Translation button.
  2. When translating a message, the translation aids of the subsequent message gets preloaded to avoid any delay during navigation.
  3. Groups of messages, especially within Translation Pages with longer content, can now be set to a different state through a button click on a redesigned interface. This feature helps in identifying the Pages that can be pushed for publication.
  4. Machine Translation suggestions from Apertium, Microsoft, and Yandex can now be dynamically presented for each message on the editor.

Besides the above, Search and Translation Editor were cross-integrated for translators, to edit Translations directly from the page displaying search results.

Search and Translation Editor were cross-integrated for translators, to edit Translations directly from the page displaying search results.

Search and Translation Editor were cross-integrated for translators, to edit Translations directly from the page displaying search results.

Pau Giner conducted a walkthrough of Translate user experience improvements, demonstrating the current state of development and the upcoming features for this extension.

Translate API changes — Changes to the Translate API now provide more information for the developers via the Web API, to help them implement customized translation interfaces.

During the development cycle, the team also engaged with the larger community to gather feedback about the new features through usability tests.

In other good news, jQuery.ime was successfully implemented on the Koha Library management system (v. 3.10) by Indranil Dasgupta. Also, do see this wonderful video about jQuery.ime by Chris Forno that blew us away.

Focus for the next sprint

Further enhancements to Translate continue to be the main focus for the next sprint. This includes review and testing of the latest designs. The Language Engineering team hopes to have more interaction in this regard at the Open Source Language Summit (organised and hosted in collaboration with Red Hat) and at GNUnify in Pune, India.

Runa Bhattacharjee, Outreach and QA coordinator, Language engineering

Help us test and investigate VisualEditor

We need your help to test VisualEditor and uncover bugs before we enable it on more wikis.

Hammer_-_Noun_project_1306.svgWe need your help to test VisualEditor and uncover bugs before we enable it on more wikis.

One of the most important and challenging software development projects at the Wikimedia Foundation right now is VisualEditor: a rich-text editor for Wikipedia that does not require users to learn MediaWiki’s markup syntax. Today, we need your help to make it more robust and reliable.

The alpha version of VisualEditor enabled on the English Wikipedia in December was focused on basic functionality. We’re now moving toward supporting more complex editing operations, notably involving non-Latin characters and character sets.

In order for all language editions of Wikipedia and its sister projects to benefit from VisualEditor, we need to test it extensively, and we need your help to break it (and fix it) before we enable it everywhere.

Non-Latin characters (like math symbols: ⟂) and scripts (like Chinese: 嘗試, and Hebrew: סה) can be more difficult to support than the set of Latin characters we use for example in English.

Starting today (Monday, January 28th, 2013) and continuing all week long, we need your help to test how VisualEditor functions when working with non-Latin characters. We’re relatively confident that VisualEditor can reliably load a wiki page and save that page without losing any information. What is less clear is whether it behaves properly when manipulating non-Latin text, special characters, and other less common aspects of the greater set of Unicode characters.

If you care at all about VisualEditor, internationalization and localization, accessibility, or you simply enjoy hunting down bugs in software, join us this week to identify those issues! You’ll help to improve VisualEditor before it’s enabled more widely.

Our test plan should tell you everything you need to know to get started. We’re also available on IRC for real-time collaboration; all the details are in our coordination page.

The Wikimedia Foundation’s software development model is iterative: we release software early, get feedback, improve it, get more feedback, etc. We’ve set up a dedicated group for this kind of testing that you may want to join. At this time, thoughtful feedback about how VisualEditor manages non-Latin characters is crucial to the next steps of our new editor. We hope to take these steps with you.

Chris McMahon, QA Lead

Language Engineering: Progress With Input Methods and Translation Editor

Batti il ferro finché è caldo —an Italian proverb

In its last two-week sprint, the Wikimedia Language Engineering team worked with developers from other teams to improve its keyboard support and we continued working on the new user interface for the Translate extension.

Input methods: More languages and support for mobile devices added

jquery.ime, Wikimedia’s portable keyboard layouts library got boosts from two sources during the last sprint.

Yuvi Panda from the mobile team refactored Wikimedia’s keyboard layouts library, jquery.ime, to make it usable on mobile phones. Now, over 60 keyboard layouts that are supported by IME will also be usable on Android mobile phones. If you’d like to try an early testing version of the mobile keyboard layouts and help developing them, head to the mobile keyboard layouts GitHub repository.

Engineers from Red Hat also joined the input method development effort and added new and improved layouts for the Gujarati, Punjabi, Tamil, Malayalam, Kannada and Telugu languages, spoken by millions of people in India.

If a keyboard layout for your language is missing, you can send a pull request to the main jquery.ime repository.

Progress on translation editor user experience

The team continued fixing and improving the new translation editor, getting it ready to release. Some of the recent improvements include:

  • The most relevant translation memory suggestions are shown at the top.
  • Messages are loaded automatically when the user scrolls to the bottom of the page.
  • The status bar at the bottom of the page shows information about the status of the translations.
  • Recently translated projects are now displayed correctly in the project selector.
  • Discouraged translation projects are omitted from the group selector.
  • Message documentation can now be edited inside in the translation editor.

A video showing some of the recently deployed features in action: Most relevant translation suggesions are shown first; Inline translation documentation editor; Automatic loading of messages when scrolling; Experimental faceted search page for translations.

The features that were already implemented, were also tested with real users by the team’s interaction designer Pau Giner. The issues that the users reported were noted and will be fixed in the coming sprints.

Niklas Laxström implemented Faceted search for translations using the Free Apache Solr engine and deployed an experimental version of the translation search on the testing site. He also made an open presentation about Solr and its upcoming use in translatewiki.net. You can watch Niklas’ presentation about Solr on YouTube.

Next week some of the team members are going to participate in the FOSDEM conference, and after that—the 2nd Language Summit in the Red Hat offices in Pune.

Amir Aharoni. Software Engineer (Internationalization), Language Engineering team

A more efficient translation interface

De mica en mica s’omple la pica i de gota en gota s’omple la bota. —a Catalan proverb

During its most recent development sprint, the Wikimedia Foundation’s Language Engineering team continued to improve the user experience of the Translate extension to make it as smooth and efficient as possible. Highlights include:

  • Pressing the “Save” button immediately shows the next string to translate, while the saving is performed in the background.
  • When progressing to the next translation, the page smoothly scrolls up.
  • Explanations about translatable strings are shown beside the corresponding message in a convenient box, which becomes expandable if the documentation is too long.
  • Machine Translation was made available for suggested translations.
  • The differences between older versions of translatable strings are also shown in a new expandable box.
  • The Language Selector API was updated to allow displaying all the documentation strings.
  • The Solr search engine schema was tweaked to make searching translatable strings more efficient and feature-rich by offering faceted search.

Below is a brief demo of the latest features of the translation editor in action. You can see translating the Etherpad Lite project into Russian there.

The Language team also continues to work on squashing bugs and adding prioritized features. You can check out the latest bleeding edge version of the translation editor on translatewiki.net, or go back to the stable translation editor. Please report Translate bugs in Bugzilla.

Amir E. Aharoni, Software Engineer (Internationalization)

Translation editor growing snazzier

(Emor me’at ve-ase harbe. —a Hebrew proverb)

The Wikimedia Foundation’s Language Engineering team is continuing the makeover of the Translate extension, which started taking shape in early December. (Introduced in 2011, this MediaWiki feature powers the translation of Wikipedia’s software, announcements, reports and fundraising banners, and of other sites and software projects.)

During its latest two-week sprint, the team improved the actual interface used for submitting and editing translations:

A screenshot of the new work-in-progress translation editor

Information about the message and translations to other languages are now shown in a collapsible box on the right side of the translation area. Warnings about potential errors in the message are shown in a small box above the editing area, which is expandable, too.

The functionality for saving and skipping messages was updated. Usability testing observations by Arun Ganesh and Pau Giner suggest that users facing a hard part in a translation are more likely to just skip it than to report the problem. Because of this, skipping a message is now recorded and frequently skipped messages will be considered for re-wording.

In the next sprint the team will work on polishing the translation interface further: better display of documentation, translation suggestions and diffs, better responsiveness, more robust language selection and other features.

In other Language Engineering news:

  • The December 2012 version of the MediaWiki Language Extension Bundle was released.
  • Better support for language variants and alternative language codes was added to the Universal Language Selector.

Amir E. Aharoni, Software Engineer (Internationalization)

Translation interface makeover in progress

Ei kannata mennä merta edemmäs kalaan. —a Finnish proverb

The Translate extension, a central piece in the puzzle that makes Wikipedia and the community around it massively multilingual, is getting a major overhaul.

“Translate”, as it’s commonly called, powers the translation of Wikipedia’s software, announcements, reports and fundraising banners, and of other sites and software projects. It focuses on making the translators’ work easy, efficient and, if possible, fun. The software gets frequent under-the-hood updates, and now the time has come for a major overhaul of its most visible part: the translation user interface.

Arun Ganesh and Pau Giner, from the Wikimedia Foundation’s Language Engineering team, have studied the current translation workflow by testing the software and interviewing translators. They drafted new interface ideas and tested experimental designs with users who speak different languages and have different levels of experience with the translation functionality.

In the team’s thirtieth two-week coding sprint, which ended last Tuesday, two major components of the overhaul have taken shape: the message group selector and the list of translatable messages.

The Message group selector. Message groups are groups of related translatable messages: a software project, a multilingual blog post or announcement, etc.

The group selector helps a translator find a project to translate using a tree-like structure of groups and sub-groups. Every project shows the completeness of the translation using a colorful progress bar. For quick and easy access to the projects that interest the translator, there’s a tab that shows recently used projects, and a responsive search function.

Listing of translatable interface message for the Visual Editor. Some messages are translated to French and some need review.

The list of strings to translate has been redesigned to improve clarity, making it easier to scan and distinguish between messages that are translated, untranslated and need to be updated (“fuzzy”).

The development of the improved user experience continues. In the next sprints, the team will complete these features and add new ones, such as an improved sign-up process and better search. Usability testing efforts will continue to ensure that the new designs provide an improved experience. If you are interested in trying the new translation tools, please volunteer for our usability testing sessions.

Other ways to connect with the Language Engineering team:

  • Pau Giner and I will present on multilingual user testing and internationalization “dos and don’ts” in the live broadcast Wikimedia Open Tech Chat on Thursday, December 13 at 20:30 UTC.
  • We’ll hold IRC Office Hours on Wednesday, December 17 at 17:30 UTC. Topics of discussion will be the translation user experience improvements, Universal Language Selector and general Q&A.

Amir E. Aharoni, Software Engineer (Internationalization)