Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Internationalization and localization

MediaWiki localization file format changed from PHP to JSON

Translations of MediaWiki’s user interface are now stored in a new file format—JSON. This change won’t have a direct effect on readers and editors of Wikimedia projects, but it makes MediaWiki more robust and open to change and reuse.

MediaWiki is one of the most internationalized open source projects. MediaWiki localization includes translating over 3,000 messages (interface strings) for MediaWiki core and an additional 20,000 messages for MediaWiki extensions and related mobile applications.

User interface messages originally in English and their translations have been historically stored in PHP files along with MediaWiki code. New messages and documentation were added in English and these messages were translated on translatewiki.net to over 300 languages. These translations were then pulled from MediaWiki websites using LocalisationUpdate, an extension MediaWiki sites use to receive translation updates.

So why change the file format?

The motivation to change the file format was driven by the need to provide more security, reduce localization file sizes and support interoperability.

Security: PHP files are executable code, so the risk of malicious code being injected is significant. In contrast, JSON files are only data which minimizes this risk.

Reducing file size: Some of the larger extensions have had multi-megabyte data files. Editing those files was becoming a management nightmare for developers, so these were reduced to one file per language instead of storing all languages in large sized files.

Interoperability: The new format increases interoperability by allowing features like VisualEditor and Universal Language Selector to be decoupled from MediaWiki because it allows using JSON formats without MediaWiki. This was earlier demonstrated for the jquery.18n library. This library, developed by Wikimedia’s Language Engineering team in 2012, had internationalization features that are very similar to what MediaWiki offers, but it was written fully in JavaScript, and stored messages and message translations using JSON format. With LocalisationUpdate’s modernization, MediaWiki localization files are now compatible with those used by jquery.i18n.

An RFC on this topic was compiled and accepted by the developer community. In late 2013, developers from the Language Engineering and VisualEditor teams at Wikimedia collaborated to figure out how MediaWiki could best be able to process messages from JSON files. They wrote a script for converting PHP to JSON, made sure that MediaWiki’s localization cache worked with JSON, updated the LocalisationUpdate extension for JSON support.

Siebrand Mazeland converted all the extensions to the new format. This project was completed in early April 2014, when MediaWiki core switched over to processing JSON, creating the largest MediaWiki patch ever in terms of lines of code. The localization formats are documented in mediawiki.org, and MediaWiki’s general localization guidelines have been updated as well.

As a side effect, code analyzers like Ohloh no longer report skewed numbers for lines of PHP code, making metrics like comment ratio comparable with other projects.

Work is in progress on migrating other localized strings, such as namespace names and MediaWiki magic words. These will be addressed in a future RFC.

This migration project exemplifies collaboration at its best between many MediaWiki engineers contributing to this project. I would like to specially mention Adam Wight, Antoine Musso, David Chan, Ed Sanders, Federico Leva, James Forrester, Jon Robson, Kartik Mistry, Niklas Laxström, Raimond Spekking, Roan Kattouw, Rob Moen, Sam Reed, Santhosh Thottingal, Siebrand Mazeland and Timo Tijhof.

Amir Aharoni, Interim PO and Software Engineer, Wikimedia Language Engineering Team

Modernising MediaWiki’s Localisation Update

Interface messages on MediaWiki and its many extensions are translated into more than 350 languages on translatewiki.net. Thousands of translations are created or updated each day. Usually, users of a wiki would have to wait until a new version of MediaWiki or of an extension is released to see these updated translations. However, webmasters can use the LocalisationUpdate extension to fetch and apply these translations daily without having to update the source code.

LocalisationUpdate provides a command line script to fetch updated translations. It can be run manually, but usually it is configured to run automatically using cron jobs. The sequence of events that the script follows is:

  1. Gather a list of all localisation files that are in use on the wiki.
  2. Fetch the latest localisation files from either:
    • an online source code repository, using https, or
    • clones of the repositories in the local file system.
  3. Check whether English strings have changed to skip incompatible updates.
  4. Compare all translations in all languages to find updated and new translations.
  5. Store the translations in separate localisation files.

MediaWiki’s localisation cache will automatically find the new translations via a hook subscribed by the LocalisationUpdate extension.

Until very recently the localisation files existed in PHP format. These are now converted to JSON format. This update required changes to be made in LocalisationUpdate to handle JSON files. Extending the code piecemeal over the years had made the code base tough to maintain. The code has been rewritten with extensibility to support future development as well as to retain adequate support for older MediaWiki versions that use this extension.

The rewrite did not add any new features except support for JSON format. The code for the existing functionality was refactored using modern development patterns such as separation of concerns and dependency injection. Unit tests were added as well.

The configuration format for the update scripts changed, but most webmasters won’t need to change anything, and will be able to use the default settings. Changes will be needed only on sites that for some reason don’t use the default repositories.

New features are being planned for future versions that would optimise LocalisationUpdate to run faster and without any manual configuration. Currently, the client downloads the latest translations for all extensions in all languages and then compares which translations can be updated. By moving some of the complex processing to a separate web service, the client can save bandwidth by downloading only updated messages for specific updated languages used by the reader.

There are still more things to improve in LocalisationUpdate. If you are a developer or a webmaster of a MediaWiki site, please join us in shaping the future of this tool.

Niklas Laxström and Runa Bhattacharjee, Language Engineering, Wikimedia Foundation

Webfonts: Making Wikimedia projects readable for everyone

Wikimedia wikis are available in nearly 300 languages, with some of them having pages with mixed-script content. An example is the page on the writing systems of India on the English Wikipedia. We expect users to be able to view this page in full and not see meaningless squares also known as tofu. These tofu squares represent letters written in the language, but cannot be rendered by the web browser on the reader’s computer. This may happen due to several reasons:

  • The device does not have the font for the particular script;
  • The operating system or the web browser do not support the technology to render the character;
  • The operating system or the browser support the script partially. For instance, due to gradual addition of characters in recent Unicode versions for several scripts, the existing older fonts may not be able to support the new characters.

Fonts for most languages written in the Latin script are widely available on a variety of devices. However, languages written in other scripts often face obstacles when fonts on operating systems are unavailable, outdated, bug-ridden or aesthetically sub-optimal for reading content.

Using Webfonts with MediaWiki

To alleviate these shortcomings, the WebFonts extension was first developed and deployed to some wikis in December 2011. The underlying technology provides the ability to download fonts automatically to the user if they are not present on the reader’s device, similar to how images in web pages are downloaded.

The old WebFonts extension was converted to the jquery.webfonts library, which was included in the Universal Language Selectorthe extension that replaced the old WebFonts extension. Webfonts are applied using the jquery.webfonts library, and on Wikimedia wikis it is configured to use the fonts in the MediaWiki repository. The two important questions we need answered before this can be done are:

  1. Will the user need webfonts?
  2. If yes, which one(s)?

Webfonts are provided when:

  • Users have chosen to use webfonts in their user preference.
  • The font is explicitly selected in CSS.
  • Users viewing content in a particular language do not have the fonts on their local devices, or the devices do not display the characters correctly, and the language has an associated default font that can be used instead. Before the webfonts are downloaded, a test currently known as “tofu detection” is done to ascertain that the local fonts are indeed not usable. The default fonts are chosen by the user community.

Webfonts are not applied:

  • when users choose not to use webfonts, even if there exists a valid reason to use webfonts (see above);
  • in the text edit area of the page, where the user’s preference or browser settings are honored.

See image (below) for a graphical description of the process.

‘Tofu’ Detection

The font to be applied is chosen either by the name of the font-family or as per the language, if the designated font family is not available. For the latter, the default font is at the top of the heap. However, negotiating more complex selection options like font inheritance, and fallback add to the challenge. For projects like Wikimedia, selecting appropriate fonts for inclusion is also of concern. The many challenges include the absence of well-maintained fonts, limited number of freely licensed fonts and rejection of fonts by users for being sub-optimal.

Challenges to Webfonts

Merely serving the webfont is not the only challenge that this technology faces. The complexities are compounded for languages of South and South-East Asia, as well as Ethiopia and few other scripts with nascent internationalization support. Font rendering and support for the scripts vary across operating system platforms. The inconsistency can stem from the technology that is used like the rendering engines, which can display widely different results across browsers and operating systems. Santhosh Thottingal, senior engineer for Wikimedia’s Language Engineering team who has been participating in recent developments to make webfonts more efficient, outlines this in greater detail.

Checkbox in the Universal Language Selector preferences to download webfonts

A major impact is on bandwidth consumption and on page load time due to additional overhead of delivering webfonts for millions of users. A recent fallout of this challenge was the change that was introduced in the Universal Language Selector (ULS) to prevent pages from being loaded slowly, particularly when bandwidth is a premium commodity. A checkbox now allows the users to determine if they would like webfonts to be downloaded.

Implementing Webfonts

Several clever solutions are currently in use to avoid the known challenges. The webfonts are prepared with an aim to create comparatively smaller footprints. For instance, Google’s sfntly tool that uses MicroType Express for compression is used for creating the fonts in EOT format (WOFF being the other widely used webfont format). However, the inherent demands of a script with larger character sets cannot always be overridden efficiently. Caches are used to reduce unnecessary webfonts downloads.

FOUT or Flash Of Unstyled Text is an unavoidable consequence when the browser displays text in dissimilar styling or no text at all, while waiting for the webfonts to load. Different web browsers handle this differently while optimizations are in the making. A possible solution in the near future may be the introduction of the in-development WOFF2 webfonts format that is expected to further reduce font size, improve performance and font load events.

Special fonts like the Autonym font are used in places where known textlike a list of language namesis required to be displayed in multiple scripts. The font carries only the characters that are necessary to display the predefined content.

Additional optimizations at this point are directed towards improving the performance of the JavaScript libraries that are used.

Conclusion

Several technical solutions are being explored within Wikimedia Language Engineering and in collaboration with organizations with similar interests. Wikipedia’s sister project Wikisource attempts to digitize and preserve copyright-expired literature, some of which is written in ancient scripts. In these as well as other cases like accessibility support, webfonts technology allows fonts for special needs to be made available for wider use. The clear goal is to have readable text available for all users irrespective of the language, script, device, platform, bandwidth, content and special needs.

For more information on implementing webfonts in MediaWiki, we encourage you to read and contribute to the technical document on mediawiki.org

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

Language Engineering Events – Language Summit, Fall 2013

The Wikimedia Language Engineering team, along with Red Hat, organised the Fall edition of the Open Source Language Summit in Pune, India on November 18 and 19, 2013.

Members from the Language Engineering, Mobile, VisualEditor, and Design teams of the Wikimedia Foundation joined participants from Red Hat, Google, Adobe, Microsoft Research, Indic language projects, Open Source Projects (Fedora, Debian) and Wikipedians from various Indian languages. Google Summer of Code interns for Wikimedia Language projects were also present. The 2-day event was organised as work-sessions, focussed on fonts, input tools, content translation and language support on desktop, web and mobile platforms.

Participants at the Open Source Language Summit, Pune India

The Fontbook project, started during the Language Summit earlier this year, was marked to be extended to 8 more Indian languages. The project aims to create a technical specification for Indic fonts based upon the Open Type v 1.6 specifications. Pravin Satpute and Sneha Kore of Red Hat presented their work for the next version of the Lohit font-family based upon the same specification, using Harfbuzz-ng. It is expected that this effort will complement the expected accomplishment of the Fontbook project.

The other font sessions included a walkthrough of the Autonym font created by Santhosh Thottingal, a Q&A session by Behdad Esfahbod about the state of Indic font rendering through Harfbuzz-ng, and a session to package webfonts for Debian and Fedora for native support. Learn more about the font sessions.

Improving the input tools for multilingual input on the VisualEditor was extensively discussed. David Chan walked through the event logger system built for capturing IME input events, which is being used as an automated IME testing framework available at http://tinyurl.com/imelog to build a library of similar events across IMEs, OSs and languages.

Santhosh Thottingal stepped through several tough use cases of handling multilingual input, to support the VisualEditor’s inherent need to provide non-native support for handling language content blocks within the contentEditable surface. Wikipedians from various Indic languages also provided their inputs. On-screen keyboards, mobile input methods like LiteratIM and predictive typing methods like ibus-typing-booster (available for Fedora) were also discussed. Read more about the input method sessions.

The Language Coverage Matrix Dashboard that displays language support status for all languages in Wikimedia projects was showcased. The Fedora Internationalization team, who currently provides resources for fewer languages than the Wikimedia projects, will identify the gap using the LCMD data and assess the resources that can be leveraged for enhancing the support on Desktops. Dr. Kalika Bali from Microsoft Research Labs presented on leveraging content translation platforms for Indian languages and highlighted that for Indic languages MT could be improved significantly by using web-scale content like Wikipedia.

Learn more about the sessions, accomplishments and next steps for these projects from the Event Report.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

The Autonym Font for Language Names

When an article on Wikipedia is available in multiple languages, we see the list of those languages in a column on the side of the page. The language names in the list are written in the script that the language uses (also known as language autonym).

This also means that all the appropriate fonts are needed for the autonyms to be correctly displayed. For instance, an article like the one about the Nobel Prize is available in more than 125 languages and requires approximately 35 different fonts to display the names of all the languages in the sidebar.

Language Autonyms

Initially, this was handled by the native fonts available on the reader’s device. If a font was not present, the user would see square boxes (commonly referred to as tofu) instead of the name of a language. To work around this problem, not just for the language list, but for other sections in the content area as well, the Universal Language Selector (ULS) started to provide a set of webfonts that were loaded with the page.

While this ensured that more language names would be correctly displayed, the presence of so many fonts dramatically increased the weight of the pages, which therefore loaded much more slowly for users than before. To improve client-side performance, webfonts were set not to be used for the Interlanguage links in the sidebar anymore.

Removing webfonts from the Interlanguage links was the easy and immediate solution, but it also took us back to the sup-optimal multilingual experience that we were trying to solve in the first place. Articles may be perfectly displayed thanks to web fonts, but if a link is not displayed in the language list, many users will not be able to discover that there is a version of the article in their language.

Autonyms were not needed just for Interlanguage links. They were also required for the Language Search and Selection window of the Universal Language Selector, which allows users to find their language if they are on a wiki displaying content in a script unfamiliar to them.

Missing font or “tofu”

As a solution, the Language Engineers came up with a trimmed-down font that only contains the characters required to display the names of the languages supported in MediaWiki. It has been named the Autonym font and will be used when only the autonyms are to be displayed on the page. At just over 50KB in size, it currently provides support for nearly 95% of the 400+ supported languages. The pending issues list identifies the problems with rendering and missing glyphs for some languages. If your language misses glyphs and you know of an openly-licensed font that can fill that void, please let us know so we can add it.

The autonym font addresses a very specific use case. There have been requests to explore the possibility of extending the use of this font to similar language lists, like the ones found on Wikimedia Commons. Within MediaWiki, the font can be used easily through a CSS class named autonym.

The Autonym font has been released for free use with the SIL Open Font License, Version 1.1.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

Get introduced to Internationalization engineering through the MediaWiki Language Extension Bundle

The MediaWiki Language Extension Bundle (MLEB) is a collection of MediaWiki extensions for various internationalization features. These extensions and the Bundle are maintained by the Wikimedia Language Engineering team. Each month, a new version of the Bundle is released.

The MLEB gives webmasters who run sites with MediaWiki a convenient solution to install, manage and upgrade language tools. The monthly release cycle allows for adequate testing and compatibility across the recent stable versions of MediaWiki.

A plate depicting text in Sanskrit (Devanagari script) and Pali languages, from the Illustrirte Geschichte der Schrift by Johann Christoph Carl Faulmann

The extensions that form MLEB can be used to create a multilingual wiki:

  • UniversalLanguageSelector — allows users to configure their language preferences easily;
  • Translate — allows a MediaWiki page to be translated;
  • CLDR — is a data repository for language-specific locale data like date, time, currency etc. (used by the other extensions);
  • Babel — provides information about language proficiency on user pages;
  • LocalisationUpdate — updates MediaWiki’s multilingual user interface;
  • CleanChanges — shows RecentChanges in a way that reflects translations more clearly.

The Bundle can be downloaded as a tarball or from the Wikimedia Gerrit repository. Release announcements are generally made on the last Wednesday of the month, and details of the changes can be found in the Release Notes.

Before every release, the extensions are tested against the last two stable versions of MediaWiki on several browsers. Some extensions, such as UniversalLanguageSelector and Translate, need extensive testing due to their wide range of features. The tests are prepared as Given-When-Then scenarios, i.e. an action is checked for an expected outcome assuming certain conditions are met. Some of these tests are in the process of being automated using Selenium WebDriver and the remaining tests are run manually.

The automated tests currently run only on Mozilla Firefox. For the manual test runs, the Given-When-Then scenarios are replicated across several web browsers. These are mostly the Grade-A level supported browsers. Regressions or bugs are reported through Bugzilla. If time permits, they are also fixed before the monthly release, or otherwise scheduled to be fixed in the next one.

The MLEB release process allows several opportunities for participation in the development of internationalization tools. The testing workflow introduces the participants to the features of the commonly-used extensions. Finding and tracking the bugs on Bugzilla familiarizes them with the bug lifecycle and also provides an opportunity to work closely with the developers while the bugs are being fixed. Creating a patch of code to fix the bug is the next exciting step of exploration that the new participants are always encouraged to continue.

If you’d like to participate in testing, we now have a document that will help you get started with the manual tests. Alternatively, you could also help in writing the automated tests (using Cucumber and Ruby). The newest version of MLEB has been released and is now ready for download.

Runa Bhattacharjee
Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

Language support at Wikimania 2013 in Hong Kong

With participants from more than 80 countries, Wikimania 2013 was a great opportunity for the Language Engineering team to meet with the users from very different language communities. We shared ideas about language tools with users from around the world, and the fact that Wikimania was held in Hong Kong this year was an opportunity to specifically discuss the support of our current and future tools for the Chinese language.

Extending language settings

During the developer days which preceded the conference, we discussed how the Universal Language Selector (ULS) could support languages with multiple variants and ordering methods:

  • Ordering and grouping. When ordering and grouping items in lists (such as pages in a category), different languages have different ordering rules. The problem arises when languages provide more than one way of ordering elements. In the case of Chinese, dozens of indexing schemes exist based on different criteria such as the number of strokes or the phonetic transcription to Latin (as is done for Pinyin).
  • Variant selection. Chinese comprises many regional language varieties. In order to offer users a local experience with minimal duplication, the Chinese Wikipedia allows to annotate variant differences on articles so that users can get the content adapted to their local variant.

Thanks to our conversations with Wikipedia editor and volunteer developer Liangent, we now better understand the context and the implications of those problems for the case of Chinese, which was key to informing our design process. By understanding the possible scenarios and the frequency of use of these features, we could better decide how prominently to present them to users. With this information, we extended the designs of the ULS to include both ordering and variant selection options, and could provide initial technical guidance on how to extend the modular ULS architecture to support the above features.

Extending the designs of the ULS to add language variant and sorting scheme selection (only when languages have more than one option for those).

 

Wikimedia projects support more than 300 languages with very different needs. As illustrated above, close collaboration with volunteers from the different language communities becomes essential to guarantee that all languages are properly supported. Please contact the Language Engineering team if you find that any particular aspect of your language is not properly supported in Wikimedia projects.

(more…)

Restoring the forgotten Javanese script through Wikimedia

There are several confusing and surprising things about the Javanese language. First, a lot of people confuse it with Japanese, or with Java, a programming language. Also, with over eighty million speakers, it is one of the ten most widely spoken languages in the world, yet it is not an official language in any country or territory.

Illuminated manuscript of Babad Tanah Jawi (History of the Javanese Land) from the 19th century.

Javanese is mainly spoken in Indonesia, on the island of Java, which gave its name to a popular variety of coffee. The only official language of that country is Indonesian, but Javanese is the main spoken language in its area. It is used in business, politics and literature. In fact, its literary tradition goes back to the tenth century, when an encyclopedia-like work titled Cantaka Parwa was written in it. Another Javanese encyclopedia was published in the nineteenth century, titled Bauwarna.

This tradition is being continued today by Wikipedians who speak that language: every day they strive to improve and enhance the Javanese Wikipedia, now having over forty thousand articles. One of them is Benny Lin. In addition to writing articles and explaining to people the Wikipedia mission, Benny’s special passion is making the Javanese language usable online not just in the more prevalent Latin alphabet but also in the ancient Javanese script.

This ancient script also known as Carakan was used for over a thousand years, and numerous books have been published in it. These days there’s little book publishing in it, though it is still used in some textbooks, in some Facebook groups and in public signs. Elsewhere the Latin alphabet is used more frequently. The younger generation is starting to forget the old script and this rich heritage becomes inaccessible. Benny hopes that transcribing classical literature for Wikisource and writing modern encyclopedic articles in this script, will revive interest in it and help the Javanese people achieve greater understanding of their own culture, and make these largely unknown treasures of wisdom accessible to people of all languages and cultures.

Javanese Wikipedia article about Joko Widodo

Benny presented a talk about this at Wikimania in Hong Kong, the international gathering of Wikipedians. There he also worked with Santhosh Thottingal and myself, developers from Wikimedia’s Language Engineering team, to improve the support for the Javanese script in Wikipedia. Thanks to this work, Wikipedias in all languages can now show text in the Javanese script, and the readers don’t have to install any fonts on their computers, because the fonts are delivered using webfonts technologies. The exquisite Javanese script has many ligatures and other special features, which require the Graphite technology for displaying. As of this writing, the only web browser that supports it is Firefox, but Graphite is Free Software, and it may become supported in other browsers in the near future.

Benny also completed his work for Javanese typing tools for Wikipedia, so now the script can not only be read, but also written easily. This technology can even be used on other sites and not just Wikipedia, using the jquery.ime library.

He sees his work as part of a larger effort by many people who care about the script. There are others, who design fonts, promote the script in different venues and research its literature. Beeny saw that he could contribute by making the fonts and typing tools more accessible through Wikipedia, and he just did it.

Wikimedians believe that the sum of all knowledge must be freely shared by all humans, and this means that it must also be shared in all languages. Passionate volunteers like Benny are the people who make this happen.

Amir E. Aharoni
Software Engineer, Language Engineering team

Join the Language Engineering team at Wikimania in Hong Kong next week!

Wikimania 2013 Hong Kong

Wikimania, the largest gathering of the Wikimedia world, is just around the corner. In less than a week, hundreds of volunteers will be gathering in Hong Kong to share stories and talk about grand plans for the year ahead. The Language Engineering team of the Wikimedia Foundation also plans to join in! The team has been working consistently on improving internationalization support in MediaWiki tools and in Wikimedia projects. At Wikimania 2013, the team will be present to talk and exchange ideas about these projects.

Team members will be presenting technology sessions, organizing a translation sprint and showcasing at the DevCamp. At the Translation sprint workshop, participants will get a quick introduction to using the Translate extension on Wikimedia wikis and translatewiki.net. The Translate extension provides MediaWiki with essential features needed to do translation work. It can be used to translate simple content, wiki user interfaces and system messages.

In an interesting design session, our team’s interaction designer Pau Giner will be presenting on the challenges and complexities of designing language-conscious user interfaces. In the talk titled Improving the user experience of language tools, he will look at two significant projects developed by the team: the Universal Language Selector (ULS) and Translate UX as case studies.

Team engineers Amir Aharoni and Niklas Laxström will discuss multilingual feature enhancement possibilities for handling multimedia content meta-information on Wikimedia Commons. Multilingual Wikimedia Commons – What can we do about it? To find an answer, they will touch upon the advanced features in Translate, ULS and other tools that can be used to translate images, templates, descriptions and other graphics metadata to make Commons a truly multilingual Wiki.

A technical session, MediaWiki i18n getting data-driven and world-reusable, on improvements in MediaWiki internationalization (i18n) will be presented by Santhosh Thottingal and Niklas Laxström. They will be talking about the preference of using data-driven approach in place of custom code which helps code maintainability across multiple i18n frameworks. An example is data usage from the Unicode Common Locale Data Repository (CLDR). They will be speaking about the challenges and benefits of maintaining two code bases MediaWiki JavaScript i18n extension and the jquery.i18n library for different developer audiences.

Language team members will be part of panel discussions covering WMF Agile practices and at the Ask the Developers session. Come meet us at any of these sessions and bring your toughest language software questions along!

Complete list of Language Engineering sessions.

Runa Bhattacharjee
Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

Translate the user interface of Wikipedia’s new VisualEditor

The VisualEditor beta release is being gradually rolled out to all Wikipedia editors in all languages. This is one the most exciting developments in the history of Wikipedia, because it will make editing the site accessible to the general public, rather than just to the people who have the patience to learn Wikipedia’s arcane markup language.

To make this accessibility really complete, however, the VisualEditor’s user interface needs to be completely translated to all the languages in which there is a Wikipedia. Its interface includes over a hundred new strings, and if they aren’t translated, they will appear in a foreign language on that Wikipedia (i.e. English text on Polish Wikipedia).

Take a look at the translation statistics for the VisualEditor. As you can see, the translation to a lot of important languages is far from complete or entirely absent: Arabic, Portuguese, Hindi, Swahili, Hungarian, Bulgarian, Tagalog, Urdu, Lithuanian, and many others. If you know a language in that list and the translation to it is not at 100 percent, please click the language name and complete the translation. (You’ll have to create an account at translatewiki.net, if you don’t have one already.)

The article Vilnius in the Lithuanian Wikipedia

The article “Vilnius” in the Lithuanian Wikipedia, being edited in the VisualEditor. Note that most of the buttons are written in Lithuanian, but the buttons on the toolbar are in English: “Edit source”, “Page settings”, “Cancel”, “Save page”, “Paragraph”. These buttons weren’t translated yet, so they are unusable for people who don’t know English.

Even if the translation to your language is currently complete, please check your language’s page every few days—the VisualEditor beta is in very active development, the messages to translate are updated literally every day, and you want your language to be at 100 percent all the time.

This is also an opportunity to thank the hundreds of translatewiki.net contributors, who work quietly, but persistently, and make MediaWiki and its extensions into one of the most thoroughly localized pieces of software ever.

If you haven’t joined the translatewiki.net community yet, you are very welcome!

Amir E. Aharoni
Software Engineer, Language Engineering team, Wikimedia Foundation