Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Technology

News and information from the Wikimedia Foundation’s Technology department (RSS feed).

Agile and Trello: The planning cycle

This blog series will focus on how the Wikipedia App Team uses Trello for their day to day, week to week, and sprint to sprint development cycle. In its early days, the Wikipedia App was an experimental project that needed flexibility for its evolution. The team looked at a number of tools like Bugzilla, Mingle, and Trello to wrangle our ever-growing to-do list. We found that most imposed a structure that was stifling rather than empowering, cumbersome rather than fun, and was generally overkill for what we needed.

Trello looked attractive as it took no more than a couple of minutes to see its moving parts, was available on multiple platforms, and was simple to customize. We experimented with it and quickly found that we could make it do most of what we wanted.

For those unfamiliar with Trello, it’s a list of lists at its basic level and it functions incredibly well within an Agile framework. Trello uses the concepts of boards, lists, items, and subitems. Boards contain lists which contain items which in turn contain subitems.

Here is how we use it:

Each idea starts out as a narrative or user story on our backlog board. Most of our stories are written in a “As a …, I want to …, So that …” format. This allows us to have a narrative justification for a unit of work rather than a list of technical requirements. Stories begin their life in the “In analysis” column where the product manager (who acts as the product owner) vets the idea with other stakeholders, involves the Design team, and generally incubates the story. Anyone is welcome to add a story to this column.

When the product owner feels that a story has matured enough, they place it in the “ready for prioritization” column with any required design assets. As these stories increase in number, we begin to see the next sprint forming.

Within a couple of days, the team meets and the product manager discusses the theme of the upcoming sprint. A new sprint board is created and the product manager moves the most important 3−5 stories for a deeper analysis by the whole team. The team meets and collectively refines the story cards to have a clear set of acceptance criteria under the checklist column, flags stories that need additional design, and prioritizes them in top down order.

Within a week’s time, the team meets again, but this time their goal is to estimate and do a final pass on each story card. We use a combination of Scrum for Trello and hat.jit.su to facilitate the estimation process. Once all stories have been estimated, the product manager re-prioritizes, checks against our sprint velocity, and the sprint is ready to start.

Thus at any point we have three active boards:

  • Backlog – where all stories start
  • Current Sprint – what developers are working on
  • Next Sprint – what’s coming up next

Next time we’ll see what happens from the developers’ standpoint during a sprint.

Tomasz Finc, Director of Mobile

Remembering Adrianne Wadewitz

Portrait of Adrianne Wadewitz at Wikimania 2012 in Washington, DC.

Each of us on the Wikipedia Education Program team is saddened today by the news of Adrianne Wadewitz’s passing. We know we share this sadness with everyone at the Wikimedia Foundation and so many in the Wikimedia and education communities. Our hearts go out to all of you, her family and friends. Today is a time for mourning and remembering.

Adrianne served as one of the first Campus Ambassadors for the Wikipedia Education Program (then known as the Public Policy Initiative). In this role, she consulted with professors, demonstrated Wikipedia editing and helped students collaborate with Wikipedia community members to successfully write articles. As an Educational Curriculum Advisor to the team, Adrianne blended her unique Wikipedia insight and teaching experience to help us develop Wikipedia assignments, lesson plans and our initial sample syllabus. Her work served as a base for helping university professors throughout the United States, and the world, use Wikipedia effectively in their classes.

Adrianne was also one of the very active voices in the Wikimedia community urging participation and awareness among women to tackle the project’s well-known gender gap. She was an articulate, kind, and energetic face for Wikipedia, and many know that her work helped bring new Wikipedians to the project. The Foundation produced a video exploring Adrianne’s work within the Wikipedia community in 2012.

Many in the Wikimedia community knew her from her exceptional and varied contributions, especially in the areas of gender and 18th-century British literature – in which she received a PhD last year from Indiana University, before becoming a Mellon Digital Scholarship Fellow at Occidental College. Since July of 2004, she had written 36 featured articles (the highest honor for quality on Wikipedia) and started over 100 articles – the latest being on rock climber Steph Davis.

Adrianne touched many lives as she freely shared her knowledge, expertise and passions with Wikipedia, her students, colleagues, friends and family. She will be deeply missed by all of us. Our condolences go out to her family during these very difficult times.

Rod Dunican
Director, Global Education

Wikipedia Education Program

  • See Adrianne’s user page on the English Wikipedia, her Twitter account, her home page and her blog at HASTAC (Humanities, Arts, Science and Technology Alliance and Collaboratory)
  • Wikipedians have begun to share their memories and condolences about Adrianne on her user talk page.
  • The leadership of the Wiki Education Foundation, where Adrianne was a board member, have also expressed their condolences.
  • Memorial post from HASTAC Co-founder Cathy Davidson.
  • Wikinews story on the passing of Adrianne Wadewitz.

Wikimedia’s response to the “Heartbleed” security vulnerability

English

Logo for the Heartbleed bug

On April 7th, a widespread issue in a central component of Internet security (OpenSSL) was disclosed. The vulnerability has now been fixed on all Wikimedia wikis. If you only read Wikipedia without creating an account, nothing is required from you. If you have a user account on any Wikimedia wiki, you will need to re-login the next time you use your account.

The issue, called Heartbleed, would allow attackers to gain access to privileged information on any site running a vulnerable version of that software. Wikis hosted by the Wikimedia Foundation were potentially affected by this vulnerability for several hours after it was disclosed. However, we have no evidence of any actual compromise to our systems or our users’ information, and because of the particular way our servers are configured, it would have been very difficult for an attacker to exploit the vulnerability in order to harvest users’ wiki passwords.

After we were made aware of the issue, we began upgrading all of our systems with patched versions of the software in question. We then began replacing critical user-facing SSL certificates and resetting all user session tokens. See the full timeline of our response below.

All logged-in users send a secret session token with each request to the site. If a nefarious person were able to intercept that token, they could impersonate other users. Resetting the tokens for all users has the benefit of making all users reconnect to our servers using the updated and fixed version of the OpenSSL software, thus removing this potential attack.

We recommend changing your password as a standard precautionary measure, but we do not currently intend to enforce a password change for all users. Again, there has been no evidence that Wikimedia Foundation users were targeted by this attack, but we want all of our users to be as safe as possible.

Thank you for your understanding and patience.

Greg Grossmeier, on behalf of the WMF Operations and Platform teams

Timeline of Wikimedia’s response

(Times are in UTC)

April 7th:

April 8th:

April 9th:

April 10th:

Frequently Asked Questions

(This section will be expanded as needed.)

  • Why hasn’t the “not valid before” date on your SSL certificate changed if you have already replaced it?
    Our SSL certificate provider keeps the original “not valid before” date (sometimes incorrectly referred to as an “issued on” date) in any replaced certificates. This is not an uncommon practice. Aside from looking at the change to the .pem files linked above in the Timeline, the other way of verifying that the replacement took place is to compare the fingerprint of our new certificate with our previous one.

You can translate this blog post.


 

Deutsch

Wikimedias Reaktion auf die „Heartbleed“-Sicherheitslücke

(more…)

MediaWiki localization file format changed from PHP to JSON

Translations of MediaWiki’s user interface are now stored in a new file format—JSON. This change won’t have a direct effect on readers and editors of Wikimedia projects, but it makes MediaWiki more robust and open to change and reuse.

MediaWiki is one of the most internationalized open source projects. MediaWiki localization includes translating over 3,000 messages (interface strings) for MediaWiki core and an additional 20,000 messages for MediaWiki extensions and related mobile applications.

User interface messages originally in English and their translations have been historically stored in PHP files along with MediaWiki code. New messages and documentation were added in English and these messages were translated on translatewiki.net to over 300 languages. These translations were then pulled from MediaWiki websites using LocalisationUpdate, an extension MediaWiki sites use to receive translation updates.

So why change the file format?

The motivation to change the file format was driven by the need to provide more security, reduce localization file sizes and support interoperability.

Security: PHP files are executable code, so the risk of malicious code being injected is significant. In contrast, JSON files are only data which minimizes this risk.

Reducing file size: Some of the larger extensions have had multi-megabyte data files. Editing those files was becoming a management nightmare for developers, so these were reduced to one file per language instead of storing all languages in large sized files.

Interoperability: The new format increases interoperability by allowing features like VisualEditor and Universal Language Selector to be decoupled from MediaWiki because it allows using JSON formats without MediaWiki. This was earlier demonstrated for the jquery.18n library. This library, developed by Wikimedia’s Language Engineering team in 2012, had internationalization features that are very similar to what MediaWiki offers, but it was written fully in JavaScript, and stored messages and message translations using JSON format. With LocalisationUpdate’s modernization, MediaWiki localization files are now compatible with those used by jquery.i18n.

An RFC on this topic was compiled and accepted by the developer community. In late 2013, developers from the Language Engineering and VisualEditor teams at Wikimedia collaborated to figure out how MediaWiki could best be able to process messages from JSON files. They wrote a script for converting PHP to JSON, made sure that MediaWiki’s localization cache worked with JSON, updated the LocalisationUpdate extension for JSON support.

Siebrand Mazeland converted all the extensions to the new format. This project was completed in early April 2014, when MediaWiki core switched over to processing JSON, creating the largest MediaWiki patch ever in terms of lines of code. The localization formats are documented in mediawiki.org, and MediaWiki’s general localization guidelines have been updated as well.

As a side effect, code analyzers like Ohloh no longer report skewed numbers for lines of PHP code, making metrics like comment ratio comparable with other projects.

Work is in progress on migrating other localized strings, such as namespace names and MediaWiki magic words. These will be addressed in a future RFC.

This migration project exemplifies collaboration at its best between many MediaWiki engineers contributing to this project. I would like to specially mention Adam Wight, Antoine Musso, David Chan, Ed Sanders, Federico Leva, James Forrester, Jon Robson, Kartik Mistry, Niklas Laxström, Raimond Spekking, Roan Kattouw, Rob Moen, Sam Reed, Santhosh Thottingal, Siebrand Mazeland and Timo Tijhof.

Amir Aharoni, Interim PO and Software Engineer, Wikimedia Language Engineering Team

A young developer’s story of discovery, perseverance and gratitude

This post is a discovery report written by Jared Flores and slightly edited for publication. It’s part of a series of candid essays written by Google Code-in students, outlining their first steps as members of the Wikimedia technical community. You can write your own.


When I initially heard of the Google Code-In (GCI) challenge, I wasn’t exactly jumping out of my seat. I was a little apprehensive, since the GCI sample tasks used languages such as Java, C++, and Ruby. While I’ve had my share of experience with the languages, I felt my abilities were too limited to compete. Yet, I’ve always had a fiery passion for computer science, and this challenge presented another mountain to conquer. Thus, after having filtered through the hundreds of tasks, I took the first step as a Google Code-In student.

The first task I took on was to design a share button for the Kiwix Android app, an offline Wikipedia reader. Though Kiwix itself wasn’t a sponsoring organization for GCI, it still provided a branch of tasks under the Wikimedia umbrella. With five days on the clock, I researched vigorously and studied the documentation for Android’s share API.

After a few hours of coding, the task seemed to be complete. Reading through the compiler’s documentation, I downloaded all of the listed prerequisites, then launched the Kiwix autogen bash file. But even with all of the required libraries installed, Kiwix still refused to compile. Analyzing the error logs, I encountered permission errors, illegal characters, missing files, and mismatched dependencies. My frustration growing, I even booted Linux from an old installation DVD, and tried compiling there. I continued this crazy cycle of debugging until 2 am. I would have continued longer had my parents not demanded that I sleep. The next morning, I whipped up a quick breakfast, and then rushed directly to my PC. With my mind refreshed, I tried a variety of new approaches, finally reaching a point when Kiwix compiled.

With a newly-found confidence, I decided to continue pursuing more GCI tasks. Since I had thoroughly enjoyed the challenge presented by Kiwix, I initially wanted to hunt down more of their tasks. However, finding that there weren’t many left, I gained interest in Kiwix’s supporting organization: Wikimedia. I navigated to Wikimedia’s GCI information page and began familiarizing myself with the organization’s mission.

“We believe that knowledge should be free for every human being. We prioritize efforts that empower disadvantaged and underrepresented communities, and that help overcome barriers to participation. We believe in mass collaboration, diversity and consensus building to achieve our goals. Wikipedia has become the fifth most-visited site in the world, used by more than 400 million people every month in more than 270 languages.” – About Us: Wikimedia (GCI 2013)

Reading through the last sentence once more, I realized the amazing opportunities that were ahead of me. Whenever I needed to touch up on any given topic, Wikipedia was always one of the top results. Moreover, Wikipedia had become a source of entertainment for me and my friends. We always enjoyed hitting up a random article, then using the given links to find our way to Pokémon, Jesus, or maybe even Abraham Lincoln: Vampire Hunter.

Eager to begin, I chose video editing as my second task for Wikimedia. I began the long endeavor of watching, reviewing, and editing the two forty-five minute clips. Despite the lengthy videos, I was quite amused in seeing the technical difficulties that the Wikimedia team encountered during their Google Hangout. It was also comforting to put human faces behind the Wikimedia mentors of Google Code-In.

As with my first task, the work itself sped by quickly. But also similar to Kiwix, I encountered some difficulties with the “trivial” part of the task. I had never worked with the wiki interface before, so the wiki structure was somewhat foreign. I only had a vague idea of how to create a page. I also didn’t know where to upload files, nor did I know how to create subcategories. Nonetheless, after observing the instructions in Wikipedia’s documentation, I finally managed to upload the videos. Marking the task as complete, I scouted for my third GCI task.

Unbeknownst to me, my third task for Wikimedia would also prove to be the most challenging so far. Since this task required me to modify the code, I requested developer access. With the help of Wikimedia’s instructions, I registered myself as a developer, generated a private key to use with their servers, and proceeded to download the source code.

Though my experience with Git was quite basic, MediaWiki provided an easy to follow documentation, which aided greatly in my efforts to download their repository. As I waited for the download to complete, I quickly set up an Apache server for a testing environment. Configuring the MediaWiki files for my server, I began the installation. Fortunately, MediaWiki’s interface was quite intuitive; the installer performed flawlessly with minimal user input.

“Off to a good start,” I chuckled quietly to myself, a grin spreading across my face. And with that statement I tempted fate and my troubles had begun. Upon opening the code, I realized I couldn’t easily comprehend a single line. I had worked with PHP but the code was more advanced than what I had written before.

Running my fingers through my hair, I sighed in exasperation. I spent the next few hours analyzing the code, trying my best to decipher the functions. Suddenly, patterns began appearing and I began to recognize numerous amounts of functions. I started to tinker with different modules until the code slowly unraveled.

Finally formulating a solution, my fingers moved swiftly across the keyboard, implementing the code with ease. Confident that I had tested my code well, I followed the instructions written in the GCI’s task description, and uploaded my very first patch to Gerrit.

I was surprised at how simple the upload was. But what especially surprised me was the immediate feedback from the mentors. Within a few minutes of the upload, MediaWiki developers were already reviewing the patch, making suggestions for improvement.

Thankful for their helpful input, I worked to implement the changes they suggested. Adding the finishing touches, I was ready to upload another patch. However, I was unsure if I should upload to a new Gerrit, or if I should push to the same patch as before. Unclear about the step I should take, I made the rookie error of uploading to a new Gerrit commit.

My mistake quickly received a corrective response from Aude via the Gerrit comment system. While I initially felt embarrassed, I was also relieved that I didn’t have to work alone. In fact, I was thankful that the MediaWiki collaborators taught me how to do it right.

Checking out the link Aude had given me, I learned to squash the two commits together. However, when I tried to follow Aude’s instructions, I somehow managed to mix someone else’s code with my own. What’s even worse was I already pushed the changes to Gerrit, exposing my blunder publicly.

Had it been any normal day, I would’ve just been calm and tried my best to fix it. But it just so happened to be the Thanksgiving holiday (in the United States). I had to leave in a few minutes for a family dinner and I couldn’t bear the thought of leaving my patch in a broken state.

I felt about ready to scream. I abandoned my Gerrit patch, and navigated to the task page, ready to give up. But just as I was about to revoke my claim on the task, I remembered something Quim Gil had told another GCI student:

“They are not mistakes! Only versions that can be improved. Students learn in GCI, and all of us learn every day.”

Remembering this advice, I cleared my mind, ready to do whatever it would take, and learn while I was at it. And like an answer to my prayers, Hoo Man, another developer, posted a comment in Gerrit. He guided me through how I could return to my original patch and send my new improvements through. And more importantly, he motivated me to persevere.

I came into GCI as a passionate, yet undisciplined student. I’m thrilled that in joining this competition, the Wikimedia open source community has already helped me plant the seeds of discipline, perseverance, and collaboration. It’s no coincidence that my hardest task thus far was staged on Thanksgiving. Every year I express gratitude towards friends and family. But this year, Google Code-In and the Wikimedia community have made my gratitude list as well.

Jared Flores
2013 Google Code-in student


Read in this series:

Migrating Wikimedia Labs to a new Data Center

As part of ongoing efforts to reduce our reliance on our Tampa, Florida data center, we have just moved Wikimedia Labs to EQIAD, the new data center in Ashburn, Virginia. This migration was a multi-month project and involved hard work on the part of dozens of technical volunteers. In addition to reducing our reliance on the Tampa data center, this move should provide quite a few benefits to the users and admins of Wikimedia Labs and Tool Labs.

Migration objectives

We had several objectives for the move:

  1. Upgrade our virtualization infrustructure to use OpenStack Havana;
  2. Minimize project downtime during the move;
  3. Stop relying on nova-network and start using Neutron;
  4. Convert the Labs data storage system from GlusterFS to NFS;
  5. Identify abandoned and disused Labs resources.

Upgrade and Minimize Downtime

Wikimedia Labs uses OpenStack to manage the virtualization back-end. The Tampa Labs install was running a slightly old version of OpenStack, ‘Folsom’. Folsom is more than a year old now, but OpenStack does not provide an in-place upgrade path that doesn’t require considerable downtime, so we’ve been living with Folsom to avoid disrupting existing Labs services.

Similarly, a raw migration of Labs from one set of servers to another would have required extensive downtime, as simply copying all of the data would be the work of days.

The solution to both 1) and 2) was provided by OpenStack’s multi-region support. We built an up-to-date OpenStack install (version ‘havana’) in the Ashburn center and then modified our Labs web interface to access both centers at once. In order to ease the move, Ryan Lane wrote an OpenStack tool that allowed users to simultaneously authenticate in both data centers, and updated the Labs web interface so that both data centers were visible at the same time.

At this point (roughly a month ago), we had two different clouds running: one full and one empty. Because of a shared LDAP back-end, the new cloud already knew about all of our projects and users.

Two clouds, before migration

Then we called on volunteers and project admins for help. In some cases, volunteers built fresh new Labs instances in Ashburn. In other cases, instances were shut down in Tampa and duplicated using a simple copy script run by the Wikimedia Operations team. In either case, project functions were supported in both data centers at once so that services could be switched over quickly and at the convenience of project admins.

Two clouds, during migration

As of today, over 50 projects have been copied to or rebuilt in Ashburn. For those projects with uptime requirements, the outages were generally limited to a few minutes.

Switch to OpenStack Neutron

We currently rely on the ‘nova-network’ service to manage network access between Labs instances. Nova-network is working fine, but OpenStack has introduced a new network service, Neutron, which is intended to replace nova-network. We hoped to adopt Neutron in the Ashburn cloud (largely in order to avoid being stuck using unsupported software), but quickly ran into difficulties. Our current use case (flat DHCP with floating IP addresses) is not currently supported in Neutron, and OpenStack designers seem to be wavering in their decision to deprecate nova-network.

After several days of experimentation, expedience won out and we opted to reproduce the same network setup in Ashburn that we were using in Tampa. We may or may not attempt an in-place switch to Neutron in the future, depending on whether or not nova-network continues to receive upstream support.

Switch to NFS storage

Most Labs projects have shared project-wide volume for storing files and transferring data between instances. In the original Labs setup, these shared volumes used GlusterFS. GlusterFS is easy to administer and designed for use cases similar to ours, but we’ve been plagued with reliability issues: in recent months, the lion’s share of Labs failures and downtime were the result of Gluster problems.

When setting up Tool Labs last year and facing our many issues with GlusterFS, Marc-Andre Pelletier opted to set up a new NFS system to manage shared volumes for the Tool Labs project. This work has paid off with much-improved stability, so we’ve adopted a similar system for all projects in Ashburn.

Again, we largely relied on volunteers and project admins to transfer files between the two systems. Most users were able to copy their data over as needed, scping or rsyncing between Tampa and Ashburn instances. As a hedge against accidental data loss, the old Gluster volumes were also copied over into backup directories in Ashburn using a simple script. The total volume of data copied was around 30 Terabytes; given the many-week migration period, network bandwidth between locations turned out not to be a problem.

Identify and reclaim wasted space

Many Labs projects and instances are set up for temporary experiments, and have a short useful life. The majority of them are cleaned up and deleted after use, but Labs still has a tendency to leak resources as the odd instance is left running without purpose.

We’ve never had a very good system for tracking which projects are or aren’t in current use, so the migration was a good opportunity to clean house. For every project that was actively migrated by staff or volunteers, another project or two simply sat in Tampa, unmentioned and untouched. Some of these projects may yet be useful (or might have users but no administrators), so we need to be very careful about prematurely deleting them.

Projects that were not actively migrated (or noticed, or mentioned) during the migration period have been ‘mothballed’. That means that their storage and VMS were copied to Ashburn, but are left in a shutdown state. These instances will be preserved for several months, pending requests for their revival. Once it’s clear that they’re fully abandoned (in perhaps six months), they will be deleted and the space reused for future projects.

Conclusions

In large part, this migration involved a return to older, more tested technology. I’m still hopeful that in the future Labs will be able to make use of more fundamentally cloud-designed technologies like distributed file shares, Neutron, and (in a perfect world) live instance migration. In the meantime, though, the simple approach of setting up parallel clouds and copying things across has gone quite well.

This migration relied quite heavily on volunteer assistance, and I’ve been quite charmed by how gracious the vast majority of volunteers were about this inconvenience. In many cases, project admins regarded the migration as a positive opportunity to build newer, cleaner projects in Ashburn, and many have expressed high hopes for stability in the new data center. With a bit of luck we’ll prove this optimism justified.

Andrew Bogott, DevOps Engineer

Modernising MediaWiki’s Localisation Update

Interface messages on MediaWiki and its many extensions are translated into more than 350 languages on translatewiki.net. Thousands of translations are created or updated each day. Usually, users of a wiki would have to wait until a new version of MediaWiki or of an extension is released to see these updated translations. However, webmasters can use the LocalisationUpdate extension to fetch and apply these translations daily without having to update the source code.

LocalisationUpdate provides a command line script to fetch updated translations. It can be run manually, but usually it is configured to run automatically using cron jobs. The sequence of events that the script follows is:

  1. Gather a list of all localisation files that are in use on the wiki.
  2. Fetch the latest localisation files from either:
    • an online source code repository, using https, or
    • clones of the repositories in the local file system.
  3. Check whether English strings have changed to skip incompatible updates.
  4. Compare all translations in all languages to find updated and new translations.
  5. Store the translations in separate localisation files.

MediaWiki’s localisation cache will automatically find the new translations via a hook subscribed by the LocalisationUpdate extension.

Until very recently the localisation files existed in PHP format. These are now converted to JSON format. This update required changes to be made in LocalisationUpdate to handle JSON files. Extending the code piecemeal over the years had made the code base tough to maintain. The code has been rewritten with extensibility to support future development as well as to retain adequate support for older MediaWiki versions that use this extension.

The rewrite did not add any new features except support for JSON format. The code for the existing functionality was refactored using modern development patterns such as separation of concerns and dependency injection. Unit tests were added as well.

The configuration format for the update scripts changed, but most webmasters won’t need to change anything, and will be able to use the default settings. Changes will be needed only on sites that for some reason don’t use the default repositories.

New features are being planned for future versions that would optimise LocalisationUpdate to run faster and without any manual configuration. Currently, the client downloads the latest translations for all extensions in all languages and then compares which translations can be updated. By moving some of the complex processing to a separate web service, the client can save bandwidth by downloading only updated messages for specific updated languages used by the reader.

There are still more things to improve in LocalisationUpdate. If you are a developer or a webmaster of a MediaWiki site, please join us in shaping the future of this tool.

Niklas Laxström and Runa Bhattacharjee, Language Engineering, Wikimedia Foundation

Discovering and learning by asking questions

This post is a discovery report written by Vlad John and slightly edited for publication. It’s part of a series of candid essays written by Google Code-in students, outlining their first steps as members of the Wikimedia technical community. You can write your own.


In the past years, I’ve used Wikipedia as often as I’ve used Facebook. I’ve used it for homework or simply for finding something new. When I was first introduced to the Internet world, I always asking myself: how can someone make a site with so many people browsing it? This year, I found the answer at the Google Code-In contest. As I was browsing for a task that suited me, I found an organization called Wikimedia.

While browsing the tasks they offered, I found something that caught my eye. It was a task about editing the wiki. I was so happy that I had finally found a task that suited my tastes that I clicked “Claim Task” before reading what I had to do. But when I read more about specifics of the task… well, it is enough to say that I had no idea how to start. I was supposed to “clean up” the “Raw projects” section of the Possible projects page. I clicked the link to the wiki page I was supposed to edit, and as I started working, I encountered several problems that I will describe in a moment. But thanks to my mentor, Quim Gil, I succeeded in completing the task.

I always wanted to edit a Wiki page, but at first I was afraid. What if I did something wrong? After posting a text file on the Code-in task’s page, I received a comment that said that in the end I’d need to edit the wiki page itself, so I might as well start early. This made sense, so I dove into the unknown territory of editing.

I started by looking at the history of the page to find the things I had to add. That took a while, but in a shorter time that I first thought was necessary, I learned how to find information in earlier edits, how to edit the source code of the page and how to do minor edits on the headings and structure. But this was the easy part.

I just had to copy some names and move them to their appropriate place. However, when it came to reporting bugs, I was indeed lost. I knew from the task I had to use Bugzilla to report bugs and add comments, but I didn’t have the foggiest idea how to do it. That is when I started doing what I had to do in the first place: ask questions.

I realized that the whole point of this exercise was to teach students how to do different things, and the most important thing when learning is to ask questions everywhere: on forums, consult the FAQ or the Manual , or simply search more for the answer. So I began by reading the full Bugzilla guide, but that did not really answer my questions. At least, not until I found the “How to report a bug” guide. This gave me some important information, like what to look for and how a report should look.

But I still had one problem: the guide said a thing and the mentor said something else. So I decided to ask once more on the page of the task. In no time, I received an answer and a model. Apparently, the guide was right about one part of the task, and the mentor was right about another part. So, by combining the answers from these two sources, I managed to find the answer to my problem. Once I knew what I was looking for, and once I asked the right questions, I got the answers I needed.

From there, it was not too hard to start adding and commenting bugs on Bugzilla. The next problem appeared when I had to add the bug reports on the wiki page… I thought I was done the moment I added the bugs on Bugzilla, but again my lack of attention and knowledge got the best of me. So I told myself: If asking the right question gets me the information I need, why not ask again? After all I am here to learn.

So I went back to the task page and put another 2 paragraphs of questions. Indeed, I received the answers that helped me learn something about editing the source of the page. So I dove in once again in the unknown and started the work. After a hard time finding the bug reports again, I was finally done and I completed the task.

After finishing, I realised that a person can learn anything on his or her own, but learning is more effective if a mentor or teacher helps you. Also, a teacher that just tells you what to read and does not explain is less helpful than a teacher that knows how and what to explain, when to do it and speaks to you in a nice way, and by that helping you, like Quim Gil helped me, with explanations and examples, in completing the task.

So, to sum up, if you ever want to learn something about Wikimedia (or other things), the best way is to ask other people, be he or she a mentor like Quim Gil was for me, or a complete stranger on a forum, like StackOverflow, which is an important place for coding and scripting help. Many people say that learning has no shortcuts, but, if questions are not shortcuts, then they sure are a real help in education. Why? Because with questions come information, and with information comes knowledge.

Vlad John
2013 Google Code-in student


Read in this series:

A junior developer discovers MediaWiki

This post is a discovery report written by Coder55 and slightly edited for publication. It’s part of a series of candid essays written by Google Code-in students, outlining their first steps as members of the Wikimedia technical community. You can write your own.


I’m a 17-year-old boy from Germany interested in computer science. I write my own little programs in PHP, Python and Java and have even produced some Android apps. I completed a Python course in three days, and now I’m using Python to solve math problems. I heard about Google Code-in on a German news site for young people interested in computer science.

Account creation and language selection

The instructions for Google Code-in students were easy to understand, even for people who aren’t so good in English. After that, I created an account on mediawiki.org. The registration form looked modern; I wanted to take the user name ‘Coder55’, but it was already taken so an account creation error was displayed. The text I typed in for password and email were deleted after the error; maybe it could be saved in a session variable and written into the text fields via JavaScript.

After registering and logging in, I saw many different options in the top line. It was easy to change the language and to read my welcome message. Maybe the button with the text ‘log out’ could be replaced by a logout button with a little picture, to make the top line smaller and even easier to understand.

After that, I changed the language to German and Spanish because I wanted to see how much of the site had been translated. I was quite disappointed that only the top menu was completely translated. The left sidebar was not completely translated, even though many important links can be found there, like one to the Main Page. I was also surprised that the language of the content on the page didn’t change after I changed my language options: if I’m on the Main Page and I change the language to German, I still see the Main Page in English, although the left menu has partially changed to German. This puzzled me until I found out I had to click on ‘Hauptseite’, ‘Página principal’ etc. to see the Main Page in another language.

How to become a MediaWiki hacker

I am really interested in Developing, so the next thing I did was visiting the How to become a MediaWiki hacker page, where I found interesting tutorials that explained how to develop something on the MediaWiki platform. The page was clearly arranged and I really liked it. It clearly separated the required abilities (PHP, MySQL e.g.) and made it easy to see where I needed to learn something and where I already knew enough. The ‘Get started’ part was particularly helpful: I could start quite fast extending MediaWiki.

One thing that was missing for someone like me: example code of a really easy extension. Although all the aspects of developing are explained in detail in the Developing manual, seeing those easy extensions requires to follow several links; it would be really helpful for beginners to include and explain one or two of these examples in the manual.

I had already been programming some little programs in PHP (chat server, forum etc.) so the next thing I did was to study MediaWiki’s coding conventions; they were explained clearly and were easy to understand. The ‘C borrowings’ part was really interesting.

Around MediaWiki: API, bugzilla, git and Wikitech

Unfortunately, I didn’t find the video on the API page very helpful. The pictures were blinking and the voices hard to understand. But the rest of the API documentation was really informative and easy enough to understand.

After that, I looked at the “Development” section of the left sidebar. I visited the Bugzilla overview, and then the actual site. I really liked the idea of Bugzilla, where every developer can see where help is needed. However, if you don’t know specifically, what to look for, the search function in Bugzilla isn’t very helpful[Note 1].

I then clicked on the link called ‘browse repository’. I was positively surprised by what the site looked like. I especially liked the possibility to see which parts of MediaWiki had been just updated. I also took a look at at Wikitech; The Main Page looked really similar to Wikipedia and MediaWiki, so it seemed easy to navigate.

The Pre-Commit Checklist

On the next day, I read about how to install and configure MediaWiki. The documentation was clear and easy to read, but I didn’t understand all of it, probably because I’m more interested in developing than in hosting.

Following this, I looked into more details about developing at the Developer hub. I had already studied the coding conventions, so I started reading the Pre-Commit Checklist.

This checklist contained many questions, but for someone like me who hasn’t already uploaded code there, they are partially not understandable. The part about Testing was clearer for me because it was explained a little bit more. Maybe the questions in the checklist should be written in a little more detail, or some of the difficult words should be converted into links.

I liked having an overview over all conventions at the bottom of the page. I could easily navigate to another convention list, like the coding conventions for JavaScript. These conventions were explained in detail and with clear examples. I especially liked the part about whitespace where many rules have been written clearly and concisely.

In conclusion

MediaWiki is a very interesting platform and although some things are not perfect (e.g. translation or registering form), it is easy to join the community. The most active contributors are accessible on IRC, which makes communication easier. After discovering the technical world of MediaWiki, I’m really interested in getting involved into the community, although that will need to wait until I finish school.

Coder55
2013 Google Code-in student


  1. Editor’s note: Bugzilla has since been upgraded, and its main page now features common search queries.

Read in this series:

Through the maze of newcomer developer documentation

This post is a discovery report written by David Wood and slightly edited for publication. It’s part of a series of candid essays written by Google Code-in students, outlining their first steps as members of the Wikimedia technical community. You can write your own.


This discovery essay touches on my general thoughts as I initially started to browse and look into developing for MediaWiki.

I’ve split it into three sections: Setting up, where I cover my experiences while working with pywikibot for a previous Google Code-In task; First Impressions, where I describe my thoughts as I browse the documentation geared towards newcomers; and Developer Hub, where I describe my thoughts as I venture into the actual development articles.

Setting up

Before looking to develop for MediaWiki, I had previously completed a task relating to pywikibot. I found that the mentors were very helpful and available for advice.

However, I did find issue with setting up the development environment for work on pywikibot. It seemed very complex, and at first I did not fully understand what I was required to do. For someone who hasn’t worked on large-scale projects before, such as MediaWiki, I was confused as to why it was required to have so many accounts and things set up beforehand, such as a Gerrit account.

Although I now understand, I feel that a guide explaining to newcomers, not necessarily new to collaboration with git, but to using less known tools such as Gerrit, why they are necessary, would be helpful. And although I understand that in some cases not much can be simplified as setup is complicated, perhaps a more in-depth guide would help as well (keep in mind, this is referring to the guide for installing pywikibot, and the guide for MediaWiki in general may be better).

First impressions: a guided tour for newcomers

Coming from only basic experience with a small project within MediaWiki, I was pleasantly surprised with the quality of the information and simplicity of it from the How to become a MediaWiki hacker page. There was a lot of information, for example, instead of simply telling the reader that they required knowledge in PHP, MySQL and JavaScript, the guide went on to link them to where they could gain such knowledge.

From there, I went to read A brief history and summary of MediaWiki. This was especially interesting as not only was it a engrossing read, it also helped the user understand the principles under which MediaWiki is developed, such as the fine line between performance and functionality. Such information helps a new user, such as myself, understand the goals behind MediaWiki and the mindset in which I should be working.

Another pleasant surprise was that even the more technical articles, such as Security for Developers, were written in plain English, without a lot of technical language. And even where it got technical, it was explained well. Guides that have a lot of importance, such as this one relating to Security, benefit even more from being simple for newcomers than most, as it’s more likely then that a newcomer would understand and implement what they’ve learnt.

Another note I made was that all the links that would be relevant to newcomers, such as Coding Conventions, were all easily found.

Developer Hub: What next?

My next stop was the Developer Hub, where I found that I wasn’t sure how exactly to proceed. There, unlike the last article, didn’t seem to be a clear path to follow when traversing the article, most likely due to this article not being geared directly towards newcomers to MediaWiki.

This is where I experienced most issue; from here there was no more crutch helping me along. I somewhat expected, as I had seen on other projects, there to be a simple guide for what to do next for newcomers and, unless I couldn’t see it, there wasn’t one. I was left unsure as to what to do next; Should I just start browsing code? Look at feature requests? Or for bugs? I think this is where some guidance would be helpful for newcomers; getting to this point was well documented, but after you’ve set everything up, you’re left wondering what next. Some sort of list of easy bugs, or projects for newcomers to contribute to, would give some guidance as to what type of contributions they should be looking to make next.

I also noted that some information linked from the developer hub, such as an archived roadmap, was out-of-date or marked as only available for historical purposes. While I understand the reasoning behind this, it’s still confusing as these links are still prominent on the page.

In conclusion

While I didn’t install MediaWiki personally, my experiences toward the complexity of setting up things, as detailed in the first part, where from a pywikipediabot perspective, as I come from a python background rather than a PHP background. I would consider however contributing to MediaWiki in the future, if I ever take time to learn PHP, as it not only seems a enjoyable experience, but I appreciate the ideologies behind MediaWiki, to support a community that creates and curates freely reusable knowledge on an open platform.

David Wood
2013 Google Code-in student


Read in this series: