Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Technology

News and information from the Wikimedia Foundation’s Technology department (RSS feed).

VisualEditor gadgets

This post was written by two recipients of Individual Engagement Grants. These grants are awarded by the Wikimedia Foundation and aim to support Wikimedians in completing projects that benefit the Wikimedia movement. The grantees of this project work independently from the Foundation in the creation of their project.

Directionality tool. An example for useful site specific additional button to VE, which adds RTL mark

Many gadgets and scripts have been created by volunteers across Wikimedia projects. Many of them are intended for an improved editing experience. For the past few months there has been a new VisualEditor interface for editing articles. The interface is still in “beta,” so Wikipedians have not yet adapted it in a large scale. We believe there are many missing features, that if incorporated, can expand the VisualEditor user base. The known non-supported features are core features and extension features (such as timelines), but there are many unknown non-supported features – gadgets. Gadgets can extend and customize the visual editor and introduce new functionalities: to let more advanced users use more features (such as timeline), to introduce work-flows that are project specific (such as deletion proposals), or to easily insert popular templates such as those for citing sources. Since there is no central repository for gadgets, there is no easy way to tell what gadgets exist across all wikis.

Our project aims to organize this mess: improve gadgets sharing among communities and help push gadgets improvements for edit interface to VisualEditor. As part of this project we already:

  • Mapped all the gadgets (in any language) and created a list of all the gadgets in various projects, with popularity rating across projects.
  • Based on this list we selected key gadgets, and rewrote them to support the new VisualEditor:
    • Spell checker (Rechtschreibpruefung) – Spell checking for common errors. Spelling mistakes are highlighted in red while writing!
    • Reftoolbar – helps editors add citation templates to articles.
    • Directionality tool – Adds button to add RTL mark useful in RTL languages such as Arabic and Hebrew.
    • Common summaries – Added two new drop-down boxes below the edit summary box in save dialog with some useful default summaries.
  • Based on our experience with writing VE gadgets, we created a guide for VE gadgets writers, which should help them extend the VisualEditor with custom features. We hope it helps develop support for Visual Editor by making it more integrated with existing tools.

 

(more…)

Wikimedia engineering report, March 2014

Major news in March include:

  • an overview of webfonts, and the advantages and challenges of using them on Wikimedia sites;
  • a series of essays written by Google Code-in students who shared their impressions, frustrations and surprises as they discovered the Wikimedia and MediaWiki technical community;
  • Hovercards now available as a Beta feature on all Wikimedia wikis, allowing readers to see a short summary of an article just by hovering a link;
  • a subtle typography change across Wikimedia sites for better readability, consistency and accessibility;
  • a recap of the upgrade and migration of our bug tracking software.

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

(more…)

Agile and Trello: The planning cycle

This blog series will focus on how the Wikipedia App Team uses Trello for their day to day, week to week, and sprint to sprint development cycle. In its early days, the Wikipedia App was an experimental project that needed flexibility for its evolution. The team looked at a number of tools like Bugzilla, Mingle, and Trello to wrangle our ever-growing to-do list. We found that most imposed a structure that was stifling rather than empowering, cumbersome rather than fun, and was generally overkill for what we needed.

Trello looked attractive as it took no more than a couple of minutes to see its moving parts, was available on multiple platforms, and was simple to customize. We experimented with it and quickly found that we could make it do most of what we wanted.

For those unfamiliar with Trello, it’s a list of lists at its basic level and it functions incredibly well within an Agile framework. Trello uses the concepts of boards, lists, items, and subitems. Boards contain lists which contain items which in turn contain subitems.

Here is how we use it:

Each idea starts out as a narrative or user story on our backlog board. Most of our stories are written in a “As a …, I want to …, So that …” format. This allows us to have a narrative justification for a unit of work rather than a list of technical requirements. Stories begin their life in the “In analysis” column where the product manager (who acts as the product owner) vets the idea with other stakeholders, involves the Design team, and generally incubates the story. Anyone is welcome to add a story to this column.

When the product owner feels that a story has matured enough, they place it in the “ready for prioritization” column with any required design assets. As these stories increase in number, we begin to see the next sprint forming.

Within a couple of days, the team meets and the product manager discusses the theme of the upcoming sprint. A new sprint board is created and the product manager moves the most important 3−5 stories for a deeper analysis by the whole team. The team meets and collectively refines the story cards to have a clear set of acceptance criteria under the checklist column, flags stories that need additional design, and prioritizes them in top down order.

Within a week’s time, the team meets again, but this time their goal is to estimate and do a final pass on each story card. We use a combination of Scrum for Trello and hat.jit.su to facilitate the estimation process. Once all stories have been estimated, the product manager re-prioritizes, checks against our sprint velocity, and the sprint is ready to start.

Thus at any point we have three active boards:

  • Backlog – where all stories start
  • Current Sprint – what developers are working on
  • Next Sprint – what’s coming up next

Next time we’ll see what happens from the developers’ standpoint during a sprint.

Tomasz Finc, Director of Mobile

Remembering Adrianne Wadewitz

Portrait of Adrianne Wadewitz at Wikimania 2012 in Washington, DC.

Each of us on the Wikipedia Education Program team is saddened today by the news of Adrianne Wadewitz’s passing. We know we share this sadness with everyone at the Wikimedia Foundation and so many in the Wikimedia and education communities. Our hearts go out to all of you, her family and friends. Today is a time for mourning and remembering.

Adrianne served as one of the first Campus Ambassadors for the Wikipedia Education Program (then known as the Public Policy Initiative). In this role, she consulted with professors, demonstrated Wikipedia editing and helped students collaborate with Wikipedia community members to successfully write articles. As an Educational Curriculum Advisor to the team, Adrianne blended her unique Wikipedia insight and teaching experience to help us develop Wikipedia assignments, lesson plans and our initial sample syllabus. Her work served as a base for helping university professors throughout the United States, and the world, use Wikipedia effectively in their classes.

Adrianne was also one of the very active voices in the Wikimedia community urging participation and awareness among women to tackle the project’s well-known gender gap. She was an articulate, kind, and energetic face for Wikipedia, and many know that her work helped bring new Wikipedians to the project. The Foundation produced a video exploring Adrianne’s work within the Wikipedia community in 2012.

Many in the Wikimedia community knew her from her exceptional and varied contributions, especially in the areas of gender and 18th-century British literature – in which she received a PhD last year from Indiana University, before becoming a Mellon Digital Scholarship Fellow at Occidental College. Since July of 2004, she had written 36 featured articles (the highest honor for quality on Wikipedia) and started over 100 articles – the latest being on rock climber Steph Davis.

Adrianne touched many lives as she freely shared her knowledge, expertise and passions with Wikipedia, her students, colleagues, friends and family. She will be deeply missed by all of us. Our condolences go out to her family during these very difficult times.

Rod Dunican
Director, Global Education

Wikipedia Education Program

  • See Adrianne’s user page on the English Wikipedia, her Twitter account, her home page and her blog at HASTAC (Humanities, Arts, Science and Technology Alliance and Collaboratory)
  • Wikipedians have begun to share their memories and condolences about Adrianne on her user talk page.
  • The leadership of the Wiki Education Foundation, where Adrianne was a board member, have also expressed their condolences.
  • Memorial post from HASTAC Co-founder Cathy Davidson.
  • Wikinews story on the passing of Adrianne Wadewitz.

Wikimedia’s response to the “Heartbleed” security vulnerability

English

Logo for the Heartbleed bug

On April 7th, a widespread issue in a central component of Internet security (OpenSSL) was disclosed. The vulnerability has now been fixed on all Wikimedia wikis. If you only read Wikipedia without creating an account, nothing is required from you. If you have a user account on any Wikimedia wiki, you will need to re-login the next time you use your account.

The issue, called Heartbleed, would allow attackers to gain access to privileged information on any site running a vulnerable version of that software. Wikis hosted by the Wikimedia Foundation were potentially affected by this vulnerability for several hours after it was disclosed. However, we have no evidence of any actual compromise to our systems or our users’ information, and because of the particular way our servers are configured, it would have been very difficult for an attacker to exploit the vulnerability in order to harvest users’ wiki passwords.

After we were made aware of the issue, we began upgrading all of our systems with patched versions of the software in question. We then began replacing critical user-facing SSL certificates and resetting all user session tokens. See the full timeline of our response below.

All logged-in users send a secret session token with each request to the site. If a nefarious person were able to intercept that token, they could impersonate other users. Resetting the tokens for all users has the benefit of making all users reconnect to our servers using the updated and fixed version of the OpenSSL software, thus removing this potential attack.

We recommend changing your password as a standard precautionary measure, but we do not currently intend to enforce a password change for all users. Again, there has been no evidence that Wikimedia Foundation users were targeted by this attack, but we want all of our users to be as safe as possible.

Thank you for your understanding and patience.

Greg Grossmeier, on behalf of the WMF Operations and Platform teams

Timeline of Wikimedia’s response

(Times are in UTC)

April 7th:

April 8th:

April 9th:

April 10th:

Frequently Asked Questions

(This section will be expanded as needed.)

  • Why hasn’t the “not valid before” date on your SSL certificate changed if you have already replaced it?
    Our SSL certificate provider keeps the original “not valid before” date (sometimes incorrectly referred to as an “issued on” date) in any replaced certificates. This is not an uncommon practice. Aside from looking at the change to the .pem files linked above in the Timeline, the other way of verifying that the replacement took place is to compare the fingerprint of our new certificate with our previous one.

You can translate this blog post.


 

Deutsch

Wikimedias Reaktion auf die „Heartbleed“-Sicherheitslücke

(more…)

MediaWiki localization file format changed from PHP to JSON

Translations of MediaWiki’s user interface are now stored in a new file format—JSON. This change won’t have a direct effect on readers and editors of Wikimedia projects, but it makes MediaWiki more robust and open to change and reuse.

MediaWiki is one of the most internationalized open source projects. MediaWiki localization includes translating over 3,000 messages (interface strings) for MediaWiki core and an additional 20,000 messages for MediaWiki extensions and related mobile applications.

User interface messages originally in English and their translations have been historically stored in PHP files along with MediaWiki code. New messages and documentation were added in English and these messages were translated on translatewiki.net to over 300 languages. These translations were then pulled from MediaWiki websites using LocalisationUpdate, an extension MediaWiki sites use to receive translation updates.

So why change the file format?

The motivation to change the file format was driven by the need to provide more security, reduce localization file sizes and support interoperability.

Security: PHP files are executable code, so the risk of malicious code being injected is significant. In contrast, JSON files are only data which minimizes this risk.

Reducing file size: Some of the larger extensions have had multi-megabyte data files. Editing those files was becoming a management nightmare for developers, so these were reduced to one file per language instead of storing all languages in large sized files.

Interoperability: The new format increases interoperability by allowing features like VisualEditor and Universal Language Selector to be decoupled from MediaWiki because it allows using JSON formats without MediaWiki. This was earlier demonstrated for the jquery.18n library. This library, developed by Wikimedia’s Language Engineering team in 2012, had internationalization features that are very similar to what MediaWiki offers, but it was written fully in JavaScript, and stored messages and message translations using JSON format. With LocalisationUpdate’s modernization, MediaWiki localization files are now compatible with those used by jquery.i18n.

An RFC on this topic was compiled and accepted by the developer community. In late 2013, developers from the Language Engineering and VisualEditor teams at Wikimedia collaborated to figure out how MediaWiki could best be able to process messages from JSON files. They wrote a script for converting PHP to JSON, made sure that MediaWiki’s localization cache worked with JSON, updated the LocalisationUpdate extension for JSON support.

Siebrand Mazeland converted all the extensions to the new format. This project was completed in early April 2014, when MediaWiki core switched over to processing JSON, creating the largest MediaWiki patch ever in terms of lines of code. The localization formats are documented in mediawiki.org, and MediaWiki’s general localization guidelines have been updated as well.

As a side effect, code analyzers like Ohloh no longer report skewed numbers for lines of PHP code, making metrics like comment ratio comparable with other projects.

Work is in progress on migrating other localized strings, such as namespace names and MediaWiki magic words. These will be addressed in a future RFC.

This migration project exemplifies collaboration at its best between many MediaWiki engineers contributing to this project. I would like to specially mention Adam Wight, Antoine Musso, David Chan, Ed Sanders, Federico Leva, James Forrester, Jon Robson, Kartik Mistry, Niklas Laxström, Raimond Spekking, Roan Kattouw, Rob Moen, Sam Reed, Santhosh Thottingal, Siebrand Mazeland and Timo Tijhof.

Amir Aharoni, Interim PO and Software Engineer, Wikimedia Language Engineering Team

A young developer’s story of discovery, perseverance and gratitude

This post is a discovery report written by Jared Flores and slightly edited for publication. It’s part of a series of candid essays written by Google Code-in students, outlining their first steps as members of the Wikimedia technical community. You can write your own.


When I initially heard of the Google Code-In (GCI) challenge, I wasn’t exactly jumping out of my seat. I was a little apprehensive, since the GCI sample tasks used languages such as Java, C++, and Ruby. While I’ve had my share of experience with the languages, I felt my abilities were too limited to compete. Yet, I’ve always had a fiery passion for computer science, and this challenge presented another mountain to conquer. Thus, after having filtered through the hundreds of tasks, I took the first step as a Google Code-In student.

The first task I took on was to design a share button for the Kiwix Android app, an offline Wikipedia reader. Though Kiwix itself wasn’t a sponsoring organization for GCI, it still provided a branch of tasks under the Wikimedia umbrella. With five days on the clock, I researched vigorously and studied the documentation for Android’s share API.

After a few hours of coding, the task seemed to be complete. Reading through the compiler’s documentation, I downloaded all of the listed prerequisites, then launched the Kiwix autogen bash file. But even with all of the required libraries installed, Kiwix still refused to compile. Analyzing the error logs, I encountered permission errors, illegal characters, missing files, and mismatched dependencies. My frustration growing, I even booted Linux from an old installation DVD, and tried compiling there. I continued this crazy cycle of debugging until 2 am. I would have continued longer had my parents not demanded that I sleep. The next morning, I whipped up a quick breakfast, and then rushed directly to my PC. With my mind refreshed, I tried a variety of new approaches, finally reaching a point when Kiwix compiled.

With a newly-found confidence, I decided to continue pursuing more GCI tasks. Since I had thoroughly enjoyed the challenge presented by Kiwix, I initially wanted to hunt down more of their tasks. However, finding that there weren’t many left, I gained interest in Kiwix’s supporting organization: Wikimedia. I navigated to Wikimedia’s GCI information page and began familiarizing myself with the organization’s mission.

“We believe that knowledge should be free for every human being. We prioritize efforts that empower disadvantaged and underrepresented communities, and that help overcome barriers to participation. We believe in mass collaboration, diversity and consensus building to achieve our goals. Wikipedia has become the fifth most-visited site in the world, used by more than 400 million people every month in more than 270 languages.” – About Us: Wikimedia (GCI 2013)

Reading through the last sentence once more, I realized the amazing opportunities that were ahead of me. Whenever I needed to touch up on any given topic, Wikipedia was always one of the top results. Moreover, Wikipedia had become a source of entertainment for me and my friends. We always enjoyed hitting up a random article, then using the given links to find our way to Pokémon, Jesus, or maybe even Abraham Lincoln: Vampire Hunter.

Eager to begin, I chose video editing as my second task for Wikimedia. I began the long endeavor of watching, reviewing, and editing the two forty-five minute clips. Despite the lengthy videos, I was quite amused in seeing the technical difficulties that the Wikimedia team encountered during their Google Hangout. It was also comforting to put human faces behind the Wikimedia mentors of Google Code-In.

As with my first task, the work itself sped by quickly. But also similar to Kiwix, I encountered some difficulties with the “trivial” part of the task. I had never worked with the wiki interface before, so the wiki structure was somewhat foreign. I only had a vague idea of how to create a page. I also didn’t know where to upload files, nor did I know how to create subcategories. Nonetheless, after observing the instructions in Wikipedia’s documentation, I finally managed to upload the videos. Marking the task as complete, I scouted for my third GCI task.

Unbeknownst to me, my third task for Wikimedia would also prove to be the most challenging so far. Since this task required me to modify the code, I requested developer access. With the help of Wikimedia’s instructions, I registered myself as a developer, generated a private key to use with their servers, and proceeded to download the source code.

Though my experience with Git was quite basic, MediaWiki provided an easy to follow documentation, which aided greatly in my efforts to download their repository. As I waited for the download to complete, I quickly set up an Apache server for a testing environment. Configuring the MediaWiki files for my server, I began the installation. Fortunately, MediaWiki’s interface was quite intuitive; the installer performed flawlessly with minimal user input.

“Off to a good start,” I chuckled quietly to myself, a grin spreading across my face. And with that statement I tempted fate and my troubles had begun. Upon opening the code, I realized I couldn’t easily comprehend a single line. I had worked with PHP but the code was more advanced than what I had written before.

Running my fingers through my hair, I sighed in exasperation. I spent the next few hours analyzing the code, trying my best to decipher the functions. Suddenly, patterns began appearing and I began to recognize numerous amounts of functions. I started to tinker with different modules until the code slowly unraveled.

Finally formulating a solution, my fingers moved swiftly across the keyboard, implementing the code with ease. Confident that I had tested my code well, I followed the instructions written in the GCI’s task description, and uploaded my very first patch to Gerrit.

I was surprised at how simple the upload was. But what especially surprised me was the immediate feedback from the mentors. Within a few minutes of the upload, MediaWiki developers were already reviewing the patch, making suggestions for improvement.

Thankful for their helpful input, I worked to implement the changes they suggested. Adding the finishing touches, I was ready to upload another patch. However, I was unsure if I should upload to a new Gerrit, or if I should push to the same patch as before. Unclear about the step I should take, I made the rookie error of uploading to a new Gerrit commit.

My mistake quickly received a corrective response from Aude via the Gerrit comment system. While I initially felt embarrassed, I was also relieved that I didn’t have to work alone. In fact, I was thankful that the MediaWiki collaborators taught me how to do it right.

Checking out the link Aude had given me, I learned to squash the two commits together. However, when I tried to follow Aude’s instructions, I somehow managed to mix someone else’s code with my own. What’s even worse was I already pushed the changes to Gerrit, exposing my blunder publicly.

Had it been any normal day, I would’ve just been calm and tried my best to fix it. But it just so happened to be the Thanksgiving holiday (in the United States). I had to leave in a few minutes for a family dinner and I couldn’t bear the thought of leaving my patch in a broken state.

I felt about ready to scream. I abandoned my Gerrit patch, and navigated to the task page, ready to give up. But just as I was about to revoke my claim on the task, I remembered something Quim Gil had told another GCI student:

“They are not mistakes! Only versions that can be improved. Students learn in GCI, and all of us learn every day.”

Remembering this advice, I cleared my mind, ready to do whatever it would take, and learn while I was at it. And like an answer to my prayers, Hoo Man, another developer, posted a comment in Gerrit. He guided me through how I could return to my original patch and send my new improvements through. And more importantly, he motivated me to persevere.

I came into GCI as a passionate, yet undisciplined student. I’m thrilled that in joining this competition, the Wikimedia open source community has already helped me plant the seeds of discipline, perseverance, and collaboration. It’s no coincidence that my hardest task thus far was staged on Thanksgiving. Every year I express gratitude towards friends and family. But this year, Google Code-In and the Wikimedia community have made my gratitude list as well.

Jared Flores
2013 Google Code-in student


Read in this series:

Migrating Wikimedia Labs to a new Data Center

As part of ongoing efforts to reduce our reliance on our Tampa, Florida data center, we have just moved Wikimedia Labs to EQIAD, the new data center in Ashburn, Virginia. This migration was a multi-month project and involved hard work on the part of dozens of technical volunteers. In addition to reducing our reliance on the Tampa data center, this move should provide quite a few benefits to the users and admins of Wikimedia Labs and Tool Labs.

Migration objectives

We had several objectives for the move:

  1. Upgrade our virtualization infrustructure to use OpenStack Havana;
  2. Minimize project downtime during the move;
  3. Stop relying on nova-network and start using Neutron;
  4. Convert the Labs data storage system from GlusterFS to NFS;
  5. Identify abandoned and disused Labs resources.

Upgrade and Minimize Downtime

Wikimedia Labs uses OpenStack to manage the virtualization back-end. The Tampa Labs install was running a slightly old version of OpenStack, ‘Folsom’. Folsom is more than a year old now, but OpenStack does not provide an in-place upgrade path that doesn’t require considerable downtime, so we’ve been living with Folsom to avoid disrupting existing Labs services.

Similarly, a raw migration of Labs from one set of servers to another would have required extensive downtime, as simply copying all of the data would be the work of days.

The solution to both 1) and 2) was provided by OpenStack’s multi-region support. We built an up-to-date OpenStack install (version ‘havana’) in the Ashburn center and then modified our Labs web interface to access both centers at once. In order to ease the move, Ryan Lane wrote an OpenStack tool that allowed users to simultaneously authenticate in both data centers, and updated the Labs web interface so that both data centers were visible at the same time.

At this point (roughly a month ago), we had two different clouds running: one full and one empty. Because of a shared LDAP back-end, the new cloud already knew about all of our projects and users.

Two clouds, before migration

Then we called on volunteers and project admins for help. In some cases, volunteers built fresh new Labs instances in Ashburn. In other cases, instances were shut down in Tampa and duplicated using a simple copy script run by the Wikimedia Operations team. In either case, project functions were supported in both data centers at once so that services could be switched over quickly and at the convenience of project admins.

Two clouds, during migration

As of today, over 50 projects have been copied to or rebuilt in Ashburn. For those projects with uptime requirements, the outages were generally limited to a few minutes.

Switch to OpenStack Neutron

We currently rely on the ‘nova-network’ service to manage network access between Labs instances. Nova-network is working fine, but OpenStack has introduced a new network service, Neutron, which is intended to replace nova-network. We hoped to adopt Neutron in the Ashburn cloud (largely in order to avoid being stuck using unsupported software), but quickly ran into difficulties. Our current use case (flat DHCP with floating IP addresses) is not currently supported in Neutron, and OpenStack designers seem to be wavering in their decision to deprecate nova-network.

After several days of experimentation, expedience won out and we opted to reproduce the same network setup in Ashburn that we were using in Tampa. We may or may not attempt an in-place switch to Neutron in the future, depending on whether or not nova-network continues to receive upstream support.

Switch to NFS storage

Most Labs projects have shared project-wide volume for storing files and transferring data between instances. In the original Labs setup, these shared volumes used GlusterFS. GlusterFS is easy to administer and designed for use cases similar to ours, but we’ve been plagued with reliability issues: in recent months, the lion’s share of Labs failures and downtime were the result of Gluster problems.

When setting up Tool Labs last year and facing our many issues with GlusterFS, Marc-Andre Pelletier opted to set up a new NFS system to manage shared volumes for the Tool Labs project. This work has paid off with much-improved stability, so we’ve adopted a similar system for all projects in Ashburn.

Again, we largely relied on volunteers and project admins to transfer files between the two systems. Most users were able to copy their data over as needed, scping or rsyncing between Tampa and Ashburn instances. As a hedge against accidental data loss, the old Gluster volumes were also copied over into backup directories in Ashburn using a simple script. The total volume of data copied was around 30 Terabytes; given the many-week migration period, network bandwidth between locations turned out not to be a problem.

Identify and reclaim wasted space

Many Labs projects and instances are set up for temporary experiments, and have a short useful life. The majority of them are cleaned up and deleted after use, but Labs still has a tendency to leak resources as the odd instance is left running without purpose.

We’ve never had a very good system for tracking which projects are or aren’t in current use, so the migration was a good opportunity to clean house. For every project that was actively migrated by staff or volunteers, another project or two simply sat in Tampa, unmentioned and untouched. Some of these projects may yet be useful (or might have users but no administrators), so we need to be very careful about prematurely deleting them.

Projects that were not actively migrated (or noticed, or mentioned) during the migration period have been ‘mothballed’. That means that their storage and VMS were copied to Ashburn, but are left in a shutdown state. These instances will be preserved for several months, pending requests for their revival. Once it’s clear that they’re fully abandoned (in perhaps six months), they will be deleted and the space reused for future projects.

Conclusions

In large part, this migration involved a return to older, more tested technology. I’m still hopeful that in the future Labs will be able to make use of more fundamentally cloud-designed technologies like distributed file shares, Neutron, and (in a perfect world) live instance migration. In the meantime, though, the simple approach of setting up parallel clouds and copying things across has gone quite well.

This migration relied quite heavily on volunteer assistance, and I’ve been quite charmed by how gracious the vast majority of volunteers were about this inconvenience. In many cases, project admins regarded the migration as a positive opportunity to build newer, cleaner projects in Ashburn, and many have expressed high hopes for stability in the new data center. With a bit of luck we’ll prove this optimism justified.

Andrew Bogott, DevOps Engineer

Modernising MediaWiki’s Localisation Update

Interface messages on MediaWiki and its many extensions are translated into more than 350 languages on translatewiki.net. Thousands of translations are created or updated each day. Usually, users of a wiki would have to wait until a new version of MediaWiki or of an extension is released to see these updated translations. However, webmasters can use the LocalisationUpdate extension to fetch and apply these translations daily without having to update the source code.

LocalisationUpdate provides a command line script to fetch updated translations. It can be run manually, but usually it is configured to run automatically using cron jobs. The sequence of events that the script follows is:

  1. Gather a list of all localisation files that are in use on the wiki.
  2. Fetch the latest localisation files from either:
    • an online source code repository, using https, or
    • clones of the repositories in the local file system.
  3. Check whether English strings have changed to skip incompatible updates.
  4. Compare all translations in all languages to find updated and new translations.
  5. Store the translations in separate localisation files.

MediaWiki’s localisation cache will automatically find the new translations via a hook subscribed by the LocalisationUpdate extension.

Until very recently the localisation files existed in PHP format. These are now converted to JSON format. This update required changes to be made in LocalisationUpdate to handle JSON files. Extending the code piecemeal over the years had made the code base tough to maintain. The code has been rewritten with extensibility to support future development as well as to retain adequate support for older MediaWiki versions that use this extension.

The rewrite did not add any new features except support for JSON format. The code for the existing functionality was refactored using modern development patterns such as separation of concerns and dependency injection. Unit tests were added as well.

The configuration format for the update scripts changed, but most webmasters won’t need to change anything, and will be able to use the default settings. Changes will be needed only on sites that for some reason don’t use the default repositories.

New features are being planned for future versions that would optimise LocalisationUpdate to run faster and without any manual configuration. Currently, the client downloads the latest translations for all extensions in all languages and then compares which translations can be updated. By moving some of the complex processing to a separate web service, the client can save bandwidth by downloading only updated messages for specific updated languages used by the reader.

There are still more things to improve in LocalisationUpdate. If you are a developer or a webmaster of a MediaWiki site, please join us in shaping the future of this tool.

Niklas Laxström and Runa Bhattacharjee, Language Engineering, Wikimedia Foundation

Discovering and learning by asking questions

This post is a discovery report written by Vlad John and slightly edited for publication. It’s part of a series of candid essays written by Google Code-in students, outlining their first steps as members of the Wikimedia technical community. You can write your own.


In the past years, I’ve used Wikipedia as often as I’ve used Facebook. I’ve used it for homework or simply for finding something new. When I was first introduced to the Internet world, I always asking myself: how can someone make a site with so many people browsing it? This year, I found the answer at the Google Code-In contest. As I was browsing for a task that suited me, I found an organization called Wikimedia.

While browsing the tasks they offered, I found something that caught my eye. It was a task about editing the wiki. I was so happy that I had finally found a task that suited my tastes that I clicked “Claim Task” before reading what I had to do. But when I read more about specifics of the task… well, it is enough to say that I had no idea how to start. I was supposed to “clean up” the “Raw projects” section of the Possible projects page. I clicked the link to the wiki page I was supposed to edit, and as I started working, I encountered several problems that I will describe in a moment. But thanks to my mentor, Quim Gil, I succeeded in completing the task.

I always wanted to edit a Wiki page, but at first I was afraid. What if I did something wrong? After posting a text file on the Code-in task’s page, I received a comment that said that in the end I’d need to edit the wiki page itself, so I might as well start early. This made sense, so I dove into the unknown territory of editing.

I started by looking at the history of the page to find the things I had to add. That took a while, but in a shorter time that I first thought was necessary, I learned how to find information in earlier edits, how to edit the source code of the page and how to do minor edits on the headings and structure. But this was the easy part.

I just had to copy some names and move them to their appropriate place. However, when it came to reporting bugs, I was indeed lost. I knew from the task I had to use Bugzilla to report bugs and add comments, but I didn’t have the foggiest idea how to do it. That is when I started doing what I had to do in the first place: ask questions.

I realized that the whole point of this exercise was to teach students how to do different things, and the most important thing when learning is to ask questions everywhere: on forums, consult the FAQ or the Manual , or simply search more for the answer. So I began by reading the full Bugzilla guide, but that did not really answer my questions. At least, not until I found the “How to report a bug” guide. This gave me some important information, like what to look for and how a report should look.

But I still had one problem: the guide said a thing and the mentor said something else. So I decided to ask once more on the page of the task. In no time, I received an answer and a model. Apparently, the guide was right about one part of the task, and the mentor was right about another part. So, by combining the answers from these two sources, I managed to find the answer to my problem. Once I knew what I was looking for, and once I asked the right questions, I got the answers I needed.

From there, it was not too hard to start adding and commenting bugs on Bugzilla. The next problem appeared when I had to add the bug reports on the wiki page… I thought I was done the moment I added the bugs on Bugzilla, but again my lack of attention and knowledge got the best of me. So I told myself: If asking the right question gets me the information I need, why not ask again? After all I am here to learn.

So I went back to the task page and put another 2 paragraphs of questions. Indeed, I received the answers that helped me learn something about editing the source of the page. So I dove in once again in the unknown and started the work. After a hard time finding the bug reports again, I was finally done and I completed the task.

After finishing, I realised that a person can learn anything on his or her own, but learning is more effective if a mentor or teacher helps you. Also, a teacher that just tells you what to read and does not explain is less helpful than a teacher that knows how and what to explain, when to do it and speaks to you in a nice way, and by that helping you, like Quim Gil helped me, with explanations and examples, in completing the task.

So, to sum up, if you ever want to learn something about Wikimedia (or other things), the best way is to ask other people, be he or she a mentor like Quim Gil was for me, or a complete stranger on a forum, like StackOverflow, which is an important place for coding and scripting help. Many people say that learning has no shortcuts, but, if questions are not shortcuts, then they sure are a real help in education. Why? Because with questions come information, and with information comes knowledge.

Vlad John
2013 Google Code-in student


Read in this series: