Translations of MediaWiki’s user interface are now stored in a new file format—JSON. This change won’t have a direct effect on readers and editors of Wikimedia projects, but it makes MediaWiki more robust and open to change and reuse.
MediaWiki is one of the most internationalized open source projects. MediaWiki localization includes translating over 3,000 messages (interface strings) for MediaWiki core and an additional 20,000 messages for MediaWiki extensions and related mobile applications.
User interface messages originally in English and their translations have been historically stored in PHP files along with MediaWiki code. New messages and documentation were added in English and these messages were translated on translatewiki.net to over 300 languages. These translations were then pulled from MediaWiki websites using LocalisationUpdate, an extension MediaWiki sites use to receive translation updates.
So why change the file format?
The motivation to change the file format was driven by the need to provide more security, reduce localization file sizes and support interoperability.
Security: PHP files are executable code, so the risk of malicious code being injected is significant. In contrast, JSON files are only data which minimizes this risk.
Reducing file size: Some of the larger extensions have had multi-megabyte data files. Editing those files was becoming a management nightmare for developers, so these were reduced to one file per language instead of storing all languages in large sized files.
An RFC on this topic was compiled and accepted by the developer community. In late 2013, developers from the Language Engineering and VisualEditor teams at Wikimedia collaborated to figure out how MediaWiki could best be able to process messages from JSON files. They wrote a script for converting PHP to JSON, made sure that MediaWiki’s localization cache worked with JSON, updated the LocalisationUpdate extension for JSON support.
Siebrand Mazeland converted all the extensions to the new format. This project was completed in early April 2014, when MediaWiki core switched over to processing JSON, creating the largest MediaWiki patch ever in terms of lines of code. The localization formats are documented in mediawiki.org, and MediaWiki’s general localization guidelines have been updated as well.
As a side effect, code analyzers like Ohloh no longer report skewed numbers for lines of PHP code, making metrics like comment ratio comparable with other projects.
Work is in progress on migrating other localized strings, such as namespace names and MediaWiki magic words. These will be addressed in a future RFC.
This migration project exemplifies collaboration at its best between many MediaWiki engineers contributing to this project. I would like to specially mention Adam Wight, Antoine Musso, David Chan, Ed Sanders, Federico Leva, James Forrester, Jon Robson, Kartik Mistry, Niklas Laxström, Raimond Spekking, Roan Kattouw, Rob Moen, Sam Reed, Santhosh Thottingal, Siebrand Mazeland and Timo Tijhof.
Amir Aharoni, Interim PO and Software Engineer, Wikimedia Language Engineering Team