Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts by Amir E. Aharoni

MediaWiki localization file format changed from PHP to JSON

Translations of MediaWiki’s user interface are now stored in a new file format—JSON. This change won’t have a direct effect on readers and editors of Wikimedia projects, but it makes MediaWiki more robust and open to change and reuse.

MediaWiki is one of the most internationalized open source projects. MediaWiki localization includes translating over 3,000 messages (interface strings) for MediaWiki core and an additional 20,000 messages for MediaWiki extensions and related mobile applications.

User interface messages originally in English and their translations have been historically stored in PHP files along with MediaWiki code. New messages and documentation were added in English and these messages were translated on translatewiki.net to over 300 languages. These translations were then pulled from MediaWiki websites using LocalisationUpdate, an extension MediaWiki sites use to receive translation updates.

So why change the file format?

The motivation to change the file format was driven by the need to provide more security, reduce localization file sizes and support interoperability.

Security: PHP files are executable code, so the risk of malicious code being injected is significant. In contrast, JSON files are only data which minimizes this risk.

Reducing file size: Some of the larger extensions have had multi-megabyte data files. Editing those files was becoming a management nightmare for developers, so these were reduced to one file per language instead of storing all languages in large sized files.

Interoperability: The new format increases interoperability by allowing features like VisualEditor and Universal Language Selector to be decoupled from MediaWiki because it allows using JSON formats without MediaWiki. This was earlier demonstrated for the jquery.18n library. This library, developed by Wikimedia’s Language Engineering team in 2012, had internationalization features that are very similar to what MediaWiki offers, but it was written fully in JavaScript, and stored messages and message translations using JSON format. With LocalisationUpdate’s modernization, MediaWiki localization files are now compatible with those used by jquery.i18n.

An RFC on this topic was compiled and accepted by the developer community. In late 2013, developers from the Language Engineering and VisualEditor teams at Wikimedia collaborated to figure out how MediaWiki could best be able to process messages from JSON files. They wrote a script for converting PHP to JSON, made sure that MediaWiki’s localization cache worked with JSON, updated the LocalisationUpdate extension for JSON support.

Siebrand Mazeland converted all the extensions to the new format. This project was completed in early April 2014, when MediaWiki core switched over to processing JSON, creating the largest MediaWiki patch ever in terms of lines of code. The localization formats are documented in mediawiki.org, and MediaWiki’s general localization guidelines have been updated as well.

As a side effect, code analyzers like Ohloh no longer report skewed numbers for lines of PHP code, making metrics like comment ratio comparable with other projects.

Work is in progress on migrating other localized strings, such as namespace names and MediaWiki magic words. These will be addressed in a future RFC.

This migration project exemplifies collaboration at its best between many MediaWiki engineers contributing to this project. I would like to specially mention Adam Wight, Antoine Musso, David Chan, Ed Sanders, Federico Leva, James Forrester, Jon Robson, Kartik Mistry, Niklas Laxström, Raimond Spekking, Roan Kattouw, Rob Moen, Sam Reed, Santhosh Thottingal, Siebrand Mazeland and Timo Tijhof.

Amir Aharoni, Interim PO and Software Engineer, Wikimedia Language Engineering Team

Restoring the forgotten Javanese script through Wikimedia

There are several confusing and surprising things about the Javanese language. First, a lot of people confuse it with Japanese, or with Java, a programming language. Also, with over eighty million speakers, it is one of the ten most widely spoken languages in the world, yet it is not an official language in any country or territory.

Illuminated manuscript of Babad Tanah Jawi (History of the Javanese Land) from the 19th century.

Javanese is mainly spoken in Indonesia, on the island of Java, which gave its name to a popular variety of coffee. The only official language of that country is Indonesian, but Javanese is the main spoken language in its area. It is used in business, politics and literature. In fact, its literary tradition goes back to the tenth century, when an encyclopedia-like work titled Cantaka Parwa was written in it. Another Javanese encyclopedia was published in the nineteenth century, titled Bauwarna.

This tradition is being continued today by Wikipedians who speak that language: every day they strive to improve and enhance the Javanese Wikipedia, now having over forty thousand articles. One of them is Benny Lin. In addition to writing articles and explaining to people the Wikipedia mission, Benny’s special passion is making the Javanese language usable online not just in the more prevalent Latin alphabet but also in the ancient Javanese script.

This ancient script also known as Carakan was used for over a thousand years, and numerous books have been published in it. These days there’s little book publishing in it, though it is still used in some textbooks, in some Facebook groups and in public signs. Elsewhere the Latin alphabet is used more frequently. The younger generation is starting to forget the old script and this rich heritage becomes inaccessible. Benny hopes that transcribing classical literature for Wikisource and writing modern encyclopedic articles in this script, will revive interest in it and help the Javanese people achieve greater understanding of their own culture, and make these largely unknown treasures of wisdom accessible to people of all languages and cultures.

Javanese Wikipedia article about Joko Widodo

Benny presented a talk about this at Wikimania in Hong Kong, the international gathering of Wikipedians. There he also worked with Santhosh Thottingal and myself, developers from Wikimedia’s Language Engineering team, to improve the support for the Javanese script in Wikipedia. Thanks to this work, Wikipedias in all languages can now show text in the Javanese script, and the readers don’t have to install any fonts on their computers, because the fonts are delivered using webfonts technologies. The exquisite Javanese script has many ligatures and other special features, which require the Graphite technology for displaying. As of this writing, the only web browser that supports it is Firefox, but Graphite is Free Software, and it may become supported in other browsers in the near future.

Benny also completed his work for Javanese typing tools for Wikipedia, so now the script can not only be read, but also written easily. This technology can even be used on other sites and not just Wikipedia, using the jquery.ime library.

He sees his work as part of a larger effort by many people who care about the script. There are others, who design fonts, promote the script in different venues and research its literature. Beeny saw that he could contribute by making the fonts and typing tools more accessible through Wikipedia, and he just did it.

Wikimedians believe that the sum of all knowledge must be freely shared by all humans, and this means that it must also be shared in all languages. Passionate volunteers like Benny are the people who make this happen.

Amir E. Aharoni
Software Engineer, Language Engineering team

Translate the user interface of Wikipedia’s new VisualEditor

The VisualEditor beta release is being gradually rolled out to all Wikipedia editors in all languages. This is one the most exciting developments in the history of Wikipedia, because it will make editing the site accessible to the general public, rather than just to the people who have the patience to learn Wikipedia’s arcane markup language.

To make this accessibility really complete, however, the VisualEditor’s user interface needs to be completely translated to all the languages in which there is a Wikipedia. Its interface includes over a hundred new strings, and if they aren’t translated, they will appear in a foreign language on that Wikipedia (i.e. English text on Polish Wikipedia).

Take a look at the translation statistics for the VisualEditor. As you can see, the translation to a lot of important languages is far from complete or entirely absent: Arabic, Portuguese, Hindi, Swahili, Hungarian, Bulgarian, Tagalog, Urdu, Lithuanian, and many others. If you know a language in that list and the translation to it is not at 100 percent, please click the language name and complete the translation. (You’ll have to create an account at translatewiki.net, if you don’t have one already.)

The article Vilnius in the Lithuanian Wikipedia

The article “Vilnius” in the Lithuanian Wikipedia, being edited in the VisualEditor. Note that most of the buttons are written in Lithuanian, but the buttons on the toolbar are in English: “Edit source”, “Page settings”, “Cancel”, “Save page”, “Paragraph”. These buttons weren’t translated yet, so they are unusable for people who don’t know English.

Even if the translation to your language is currently complete, please check your language’s page every few days—the VisualEditor beta is in very active development, the messages to translate are updated literally every day, and you want your language to be at 100 percent all the time.

This is also an opportunity to thank the hundreds of translatewiki.net contributors, who work quietly, but persistently, and make MediaWiki and its extensions into one of the most thoroughly localized pieces of software ever.

If you haven’t joined the translatewiki.net community yet, you are very welcome!

Amir E. Aharoni
Software Engineer, Language Engineering team, Wikimedia Foundation

First Wikimedia hackathon in Tel Aviv, Israel

This post is available in 2 languages:
עברית 7%English 100%

English

On Thursday, 23 May, just one day before the big Wikimedia hackathon in Amsterdam, Wikimedia Israel held its first hackathon in Tel-Aviv.

Hackathon TLV 2013 - (31).jpg

Israel has a thriving software industry, as well as a healthy Wikipedia editing community. Despite this, there are relatively few software developers in Israel who work on Wikimedia-related projects, so the primary purpose of this event was to show new people who are skilled in programming and web design how they can contribute their talents to our free knowledge projects.

Wikimedia Israel already organized a hackathon as part of the Wikimania 2011 conference, which was held in Haifa, but this was the first time that such an event was produced in Israel independently of other events.

Google Israel kindly gave us the venue – the hacking space in their Tel-Aviv Campus building, which is perfect for such events: cozy, simple, with comfortable tables, a lot of power strips and good wifi. About thirty people showed up for the event. Their skills were varied and quite surprising. There were not just PHP and JavaScript developers – these languages being the most important in MediaWiki – but also experts in DevOps, integration testing, Python scripting, data visualizations and design.

Hackathon TLV 2013 - (64).jpg

In the best hackathon style, the event focused less on talks and more on code, but I was very happy to host one guest talk by Mushon Zer-Aviv, a developer of the freely licensed Alef font, designed as a modern Hebrew and Latin typeface for the web.

So, most importantly, what did the event accomplish? Among other things: fixes for two MediaWiki bugs, both made by new developers; improved automatic tests for JavaScript components; a prototype for a script that enriches Wikipedia with data from Open Knesset, a database of information about the Israeli parliament based on open-source technology; and a new template in Lua, also made by a developer who is completely new to the language. I had the feeling that most of the participants became genuinely interested in joining the community of MediaWiki developers.

I want to use this opportunity to give my very sincere thanks to the people who helped me organize the event: Chen Davidi, Itzik Edri and Dorit Shafir-Diamant, who were instrumental in organizing the event’s logistics; Michal from Google Israel for providing the venue; and also to Yair Talmor, Chezi Reshef, Yael Meron, Elad Alfassa, Oren Held, Moshe Nachmias and Yair Podemasky, who very kindly volunteered to help with setting up the venue, handled the registration and cleaned up at the end of the day.

The event was very satisfying, and we hope to have another one soon!

Amir E. Aharoni, Wikimedia Israel

Language Engineering: Progress With Input Methods and Translation Editor

Batti il ferro finché è caldo —an Italian proverb

In its last two-week sprint, the Wikimedia Language Engineering team worked with developers from other teams to improve its keyboard support and we continued working on the new user interface for the Translate extension.

Input methods: More languages and support for mobile devices added

jquery.ime, Wikimedia’s portable keyboard layouts library got boosts from two sources during the last sprint.

Yuvi Panda from the mobile team refactored Wikimedia’s keyboard layouts library, jquery.ime, to make it usable on mobile phones. Now, over 60 keyboard layouts that are supported by IME will also be usable on Android mobile phones. If you’d like to try an early testing version of the mobile keyboard layouts and help developing them, head to the mobile keyboard layouts GitHub repository.

Engineers from Red Hat also joined the input method development effort and added new and improved layouts for the Gujarati, Punjabi, Tamil, Malayalam, Kannada and Telugu languages, spoken by millions of people in India.

If a keyboard layout for your language is missing, you can send a pull request to the main jquery.ime repository.

Progress on translation editor user experience

The team continued fixing and improving the new translation editor, getting it ready to release. Some of the recent improvements include:

  • The most relevant translation memory suggestions are shown at the top.
  • Messages are loaded automatically when the user scrolls to the bottom of the page.
  • The status bar at the bottom of the page shows information about the status of the translations.
  • Recently translated projects are now displayed correctly in the project selector.
  • Discouraged translation projects are omitted from the group selector.
  • Message documentation can now be edited inside in the translation editor.

A video showing some of the recently deployed features in action: Most relevant translation suggesions are shown first; Inline translation documentation editor; Automatic loading of messages when scrolling; Experimental faceted search page for translations.

The features that were already implemented, were also tested with real users by the team’s interaction designer Pau Giner. The issues that the users reported were noted and will be fixed in the coming sprints.

Niklas Laxström implemented Faceted search for translations using the Free Apache Solr engine and deployed an experimental version of the translation search on the testing site. He also made an open presentation about Solr and its upcoming use in translatewiki.net. You can watch Niklas’ presentation about Solr on YouTube.

Next week some of the team members are going to participate in the FOSDEM conference, and after that—the 2nd Language Summit in the Red Hat offices in Pune.

Amir Aharoni. Software Engineer (Internationalization), Language Engineering team

A more efficient translation interface

De mica en mica s’omple la pica i de gota en gota s’omple la bota. —a Catalan proverb

During its most recent development sprint, the Wikimedia Foundation’s Language Engineering team continued to improve the user experience of the Translate extension to make it as smooth and efficient as possible. Highlights include:

  • Pressing the “Save” button immediately shows the next string to translate, while the saving is performed in the background.
  • When progressing to the next translation, the page smoothly scrolls up.
  • Explanations about translatable strings are shown beside the corresponding message in a convenient box, which becomes expandable if the documentation is too long.
  • Machine Translation was made available for suggested translations.
  • The differences between older versions of translatable strings are also shown in a new expandable box.
  • The Language Selector API was updated to allow displaying all the documentation strings.
  • The Solr search engine schema was tweaked to make searching translatable strings more efficient and feature-rich by offering faceted search.

Below is a brief demo of the latest features of the translation editor in action. You can see translating the Etherpad Lite project into Russian there.

The Language team also continues to work on squashing bugs and adding prioritized features. You can check out the latest bleeding edge version of the translation editor on translatewiki.net, or go back to the stable translation editor. Please report Translate bugs in Bugzilla.

Amir E. Aharoni, Software Engineer (Internationalization)

Translation editor growing snazzier

(Emor me’at ve-ase harbe. —a Hebrew proverb)

The Wikimedia Foundation’s Language Engineering team is continuing the makeover of the Translate extension, which started taking shape in early December. (Introduced in 2011, this MediaWiki feature powers the translation of Wikipedia’s software, announcements, reports and fundraising banners, and of other sites and software projects.)

During its latest two-week sprint, the team improved the actual interface used for submitting and editing translations:

A screenshot of the new work-in-progress translation editor

Information about the message and translations to other languages are now shown in a collapsible box on the right side of the translation area. Warnings about potential errors in the message are shown in a small box above the editing area, which is expandable, too.

The functionality for saving and skipping messages was updated. Usability testing observations by Arun Ganesh and Pau Giner suggest that users facing a hard part in a translation are more likely to just skip it than to report the problem. Because of this, skipping a message is now recorded and frequently skipped messages will be considered for re-wording.

In the next sprint the team will work on polishing the translation interface further: better display of documentation, translation suggestions and diffs, better responsiveness, more robust language selection and other features.

In other Language Engineering news:

  • The December 2012 version of the MediaWiki Language Extension Bundle was released.
  • Better support for language variants and alternative language codes was added to the Universal Language Selector.

Amir E. Aharoni, Software Engineer (Internationalization)

Translation interface makeover in progress

Ei kannata mennä merta edemmäs kalaan. —a Finnish proverb

The Translate extension, a central piece in the puzzle that makes Wikipedia and the community around it massively multilingual, is getting a major overhaul.

“Translate”, as it’s commonly called, powers the translation of Wikipedia’s software, announcements, reports and fundraising banners, and of other sites and software projects. It focuses on making the translators’ work easy, efficient and, if possible, fun. The software gets frequent under-the-hood updates, and now the time has come for a major overhaul of its most visible part: the translation user interface.

Arun Ganesh and Pau Giner, from the Wikimedia Foundation’s Language Engineering team, have studied the current translation workflow by testing the software and interviewing translators. They drafted new interface ideas and tested experimental designs with users who speak different languages and have different levels of experience with the translation functionality.

In the team’s thirtieth two-week coding sprint, which ended last Tuesday, two major components of the overhaul have taken shape: the message group selector and the list of translatable messages.

The Message group selector. Message groups are groups of related translatable messages: a software project, a multilingual blog post or announcement, etc.

The group selector helps a translator find a project to translate using a tree-like structure of groups and sub-groups. Every project shows the completeness of the translation using a colorful progress bar. For quick and easy access to the projects that interest the translator, there’s a tab that shows recently used projects, and a responsive search function.

Listing of translatable interface message for the Visual Editor. Some messages are translated to French and some need review.

The list of strings to translate has been redesigned to improve clarity, making it easier to scan and distinguish between messages that are translated, untranslated and need to be updated (“fuzzy”).

The development of the improved user experience continues. In the next sprints, the team will complete these features and add new ones, such as an improved sign-up process and better search. Usability testing efforts will continue to ensure that the new designs provide an improved experience. If you are interested in trying the new translation tools, please volunteer for our usability testing sessions.

Other ways to connect with the Language Engineering team:

  • Pau Giner and I will present on multilingual user testing and internationalization “dos and don’ts” in the live broadcast Wikimedia Open Tech Chat on Thursday, December 13 at 20:30 UTC.
  • We’ll hold IRC Office Hours on Wednesday, December 17 at 17:30 UTC. Topics of discussion will be the translation user experience improvements, Universal Language Selector and general Q&A.

Amir E. Aharoni, Software Engineer (Internationalization)

Language engineering news: Bugs fixed in Universal Language Selector, and a new IPA keyboard layout

Imagine a world in which every single human being can easily select the language of the website that they are reading.

One of the bugs that were fixed: not all elements of the user interface of the Universal Language Selector’s were using web fonts.

That’s what the Wikimedia Foundation’s Language Engineering team has been working on through the Universal Language Selector (ULS): a reusable user interface component for comfortable selection of the most appropriate language out of a long list of available options. It integrates new features from Project Milkshake, a set of portable JavaScript tools for internationalizing any web application with web fonts, keyboard layouts and a robust mechanism for loading translations.

The Universal Language Selector is already used on translatewiki.net and on the new Wikidata project, two massively multilingual communities of software translators and data curators, who are testing this feature in an actual production environment, and reporting many bugs. After coming back from the Bangalore Developer Camp, the team set out to fix the last major bugs in the ULS, and most notably:

Now, all buttons use web fonts and are readable.

Currently, the Universal Language Selector supports 68 keyboard layouts and 44 web fonts, and the number is growing. New fonts and keyboards are added according to the needs of the readers and the editors’ communities around the world.

In other news:

  • We held Language Engineering office hours on November 21.
  • Web fonts support was deployed to the Persian Wikipedia, but unfortunately reverted after the users found several issues with font rendering. The team hopes to fix the problems and deploy web fonts again, for the benefit of all the users who do not have good fonts installed on their computers and devices.
  • Niklas Laxström created the first test release of the MediaWiki Language Extension Bundle, an easy-to-install package of stable versions of several MediaWiki extensions that improve its multilingual support. It keeps your MediaWiki site’s interface translations up-to-date and includes “language skills” boxes, rich locale data, easy translation of content pages and site interface, and the aforementioned UniversalLanguageSelector, which helps users select the language.
  • A screenshot of MediaWiki with jquery.ime and the word ‘milkshake’ written in IPA.

    I created a keyboard mapping for easy typing in the International Phonetic Alphabet (IPA), based on the SIL IPA layout. The IPA is very commonly used as a pronunciation guide in Wikipedia and Wiktionary, and the deployment of the Universal Language Selector will make typing in IPA easier. Other IPA layouts may be easily added, for example X-SAMPA. You are very welcome to try this layout in translatewiki.net: click any text field, and select the English language and the SIL IPA layout in the keyboard layout pop-up.

The team’s next sprint marks the beginning of a new release, during which we’ll start implementing a major overhaul of the user interface of translatewiki.net.

Amir E. Aharoni, Software Engineer (Internationalization)

Wikipedia Engineering DevCamp sees a lot of energy and contributions in Bangalore

On November 9-11, the Wikimedia Foundation held a developer meetup in Bangalore, India

On November 9-11, the Wikimedia Foundation held a developer meetup in Bangalore, India. The gathering provided an opportunity for India-based developers to work with the Foundation’s engineering teams on several projects, such as JavaScript-based language engineering tools, and mobile applications with PhoneGap and LAMP technologies.

The DevCamp focused on Language Engineering, Mobile development and User interaction and experience design (UI/UX). It was attended by more than 85 developers, UX/UI designers, Wikimedians and translators. The work sessions focused on developing various Wikimedia mobile apps as well as language tools. The first day of the DevCamp kicked off on Friday with tutorials on Developing mobile applications with PhoneGap by Brion Vibber and How to internationalize your code by myself. Interactive Q&A after the sessions concluded the day with a lot of challenging and interesting questions after both tutorials.

The second day started off with Santhosh Thottingal introducing Project Milkshake (the team’s JavaScript-based internationalization libraries) and the Universal Language Selector currently under development. The mobile team introduced various mobile projects like native mobile apps, mobile front-end, and VUMI-based feature phone apps that powers Wikipedia Zero. Interaction designer Pau Giner introduced design projects and guided new contributors. People started selecting projects they were interested in and teamed up with Wikimedia engineers. It was exciting to see some contributors make their first-ever open-source commits during the DevCamp. People continued to hack throughout the two days.

The final day of the DevCamp started with stand-up updates from all participants, and ended with demos and presentations of 18 projects by 25 presenters. One of the most lovely updates was presented by Lakshmi, who learned to type in her language, Malayalam, using the typing tools that Wikimedia engineers have developed.

A screenshot of a mathematical formula rendered using the MathJax library, with a context menu in the Tamil language.

Accomplishments at the DevCamp include contributions to language engineering projects, where contributors added unit tests to jquery.ime (the input method library for multiple language scripts), submitted bug fixes, tested and actively reported bugs on jquery.ime and the Universal Language Selector. Another highlight was Brion Vibber’s integration of Universal Language Selector, WebFonts and support for language variants to the Wikipedia mobile app. One of the contributors, Ershad, built a Google Chrome extension based on the input method jquery.ime and won a Wikimedia shoulder bag for it. Other highlights include patches submitted to MathJax (a library used to render mathematical equations on HTML pages) by Aditya Ravi Shankar and myself to add internationalization support.


On the mobile platform, Swayam made enhancements to the Translate proofreading mobile app. Other mobile apps developed at the DevCamp include a Commons uploader and an app to track recent changes. Patches were also submitted to MobileFrontend, an iOS client library, and a first working version of the Wikipedia FirefoxOS app.

On the UI/UX design projects, participants worked on ideas for redesigning the translatewiki.net home page, the Mobile Universal language selector, Commons discovery and triaging apps. Here’s a complete list of demonstrations that were made at the Bangalore DevCamp; you are welcome to join the coding fun!

All in all the DevCamp maintained a high energy level throughout the three days, as well as produced a lot of new code, bug fixes, input keymaps, unit tests, mobile apps, translation UI and mobile designs, and positive collaboration across the board.

Amir E. Aharoni, Software Engineer (Internationalization)

Group photo on the lawn of the IIM Bangalore.