Wikimedia blog

News from inside the Wikimedia Foundation.org

Technology

News and information from the Wikimedia Foundation’s Technology department (RSS feed).

After the slush, the flood

after the slush, the flush

When new code does not find its way into production for quite some time, it tends to pile up. It is like with snow and when the time comes when it starts to thaw, it starts with a trickle, the trickles become a stream and all the streams rush down the mountain.

For the WMF Localisation team we worked on our documentation, our help system and our tests. We went to conferences in Belgium and India. And we worked on many small iterative improvements. We rolled out webfonts to more wikis. Input methods were improved and deployed as per requests. We have had our translation memory working on translatewiki.net for ages and now it is configured for use on the WMF wikis who use the Translate extension. Actually, we did experiment first with a new algorithm and we did configure one of the labs systems as a host for the memory of all the fine work we did and do.

Over time a lot of work went into things like plural rules. As the number of languages increase and as we support not only PHP but now also JavaScript, we are optimising our code and we are checking it again. We frequently find that a re-factoring is in order. It makes the code more elegant and easier to maintain. With added documentation and tests we ensure that we know it will work well.

Another fine project waiting to get to the stage where it will flow into our codebase is an updated Easy Timeline. The functionality has always been broken when used in many of  the “other” languages, languages written in a different direction, a different script.  The updated Easy Timeline has been given a revamp; it uses SVG to create the image and you can test it at translatewiki sandbox. Amir welcomes bug reports and LOVES to hear your comments

As you know, we use mingle for our project management (user guest, password guest). In it we have stories that explain the functionality that we are going to develop. Story 532 is one such:

As a potential translator, I want to be able to tell translation administrators in a structured way that I am interested in translating to one or more languages and at the same time provide them with some data about me and preferences on how and how often I would like to be contacted, so that translation administrators can more effectively and efficiently target translators.

Together with the acceptance criteria a narrative like this enables the developer to develop and the finished product to be accepted by our product manager. A story comes with tasks and once you have read the stories and the tasks you have a clue of what goes into getting you new functionality.

The conferences were great, we learn a lot from meeting so many wonderful people. Many tests are deployed and they run regularly. The documentation, including user documentation is written and we love you to translate many of them in your language. We feel really pumped up to get cracking and provide you with more functionality in the next sprint.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

Wikimedia engineering moving from Subversion to Git

Hello, MediaWiki developers and users! You may already be aware of this: our community is embarking on a journey to leave Subversion behind and migrate to Git for our source code repositories, starting on March 3rd. This is not an easy task. Here I’ll outline our rationale for this move, as well as our planned process.

What is Git?

Git is a distributed version control system originally developed by Linus Torvalds and others to manage the Linux kernel. In the past couple of years, it has taken off as a very robust and well-supported code repository. “Distributed” means that there is no central copy of the repository. With Subversion, Wikimedia’s servers host the repository and users commit their changes to it. In contrast, with Git, once you’ve cloned the repository, you have a fully functioning copy of the source code, with all the branches and tagged releases at your disposal.

Why switch?

Three major reasons:

To encourage participation: Since Git is distributed, it allows people to contribute with a much lower barrier to entry. Anyone will be able to clone the repository and make their own changes to keep track of them. And if you’ve got an account in our code review tool (Gerrit), you’ll be able to push changes for the wider community to review.

To fix our technical process: Subversion has technical flaws that make life difficult for developers. Notably, the implementation of branching is not very easy to use, and makes it hard to use “feature branches”. Our community is very distributed, with many parallel efforts and needs to integrate many different feature efforts, so we’d like to use feature branches more. Git branches are very easy to work with and merge between, which should make things easier for our development community.  (Several other large projects, such as Drupal and PostgreSQL, have made the same switch for similar reasons, and we’ve done our best to learn from their experiences.)

Some quotes from our community:

“I love git just because it allows me to commit locally (and offline).” – Guillaume Paumier

“[Y]ou can create commits locally and push them to the server later (great for working without wifi), you can tell it ‘save my work so I can go do something else now’ in one command, and it’ll allow us to review changes before they go into “trunk” (master)…. without human intervention in merging things into trunk. Gerrit automates this process.” – Roan Kattouw

And finally, to get improvements to users faster: with better branching and a more granular code review workflow that suits our needs better, plus our ongoing improvements to our automated testing infrastructure, we won’t have to wait months before deploying already-written features and bugfixes to Wikimedia sites.

We had years of discussion before we finally decided to switch, but now we can look forward to more flexibility and power in our engineering processes.

What are we doing?

We’ve now done almost all the back-end work of preparing our repository for the move and are in the final steps of preparation (details). We’ve also written explanations of the new workflow, the migration schedule, issues yet to be addressed, and other related topics. Right now, we’re asking people to stop creating any new extensions in Subversion right now, and to watch the wikitech-l mailing list for more updates.

What are the next steps?

Over the next two and a half weeks, the Git repository that contains MediaWiki core and extensions will be brought in step with Subversion, and at first it will be read-only (no one will be able to push changes). This will allow developers to start cloning it to their local machines and getting used to things.

For MediaWiki core and for extensions that the Wikimedia Foundation deploys on its wikis, the switchover is pencilled in for the weekend of March 3rd. We’ll do core first, and then extensions after, but hopefully all in the same weekend. After the successful migration, the Subversion repository (for the directories that have moved to Git, such as /trunk/phase3/) will be made read-only.

See the full schedule.

I develop for a Wikimedia project. Do I have to switch to Git?

Only two projects are affected immediately: the core of MediaWiki and the extensions that get deployed on Wikimedia Foundation projects.

So, if you work on an extension that the Wikimedia Foundation does not use, or on a non-MediaWiki project hosted at svn.wikimedia.org, you have more time to decide. Talk it over with your community and decide whether you would like to move to Git immediately, move to Git sometime over the next several months, or move to another hosting provider sometime before mid-2013. We would like to gradually migrate all projects currently on Wikimedia’s Subversion repository so that we can make all of svn.wikimedia.org read-only by the middle of 2013, and thus only have to support one source control infrastructure.

More details.

Will training and documentation be available? When?

Yes, we will provide training and documentation to help you use the new workflow. Check our Git page and its links now, and watch that space! There will be more documentation as well as some interactive training sessions before the big switchover in early March.

If you have any questions, please ask in #mediawiki on Freenode or on wikitech-l.  Thank you!

Chad Horohoe
Git migration lead
Platform Engineering department
Wikimedia Foundation

Sumana Harihareswara
Volunteer Development Coordinator
Platform Engineering department
Wikimedia Foundation

The #MediaWiki #hackathon in Pune, #India

When good people get together in a friendly, well organised setting like this weekend in Pune, many great things happen. Several MediaWiki developers had come to provide the many people new to MediaWiki with their expertise and guide people into its inner workings.

Many people worked on Wikimedia mobile and the SmartPhone software, others worked on MediaWiki and its extensions. Bugs got fixed and functionality got extended.

One of the surprises was two people working on the localisation for the Mongolian language. The inclusion of a web font that will support the Dzonka language is another.

Dzongkha is the official language of Bhutan and according to Ethnologue, the script used is either Tibetan script, Uchen style or the Tibetan script, Umed style. These scripts and styles are also used for the Tibetan language, it is not only Dzongkha that stands to benefit.

One of the highlights of the work on the SmartPhone app is support for scripts that are written from right to left, this is now “beta” functionality. The result of more people looking at the code was that several bugs received the attention needed to make them go away. Scrolling was one area that got attention; this results in a smoother user experience.

New input methods have been created for Punjabi transliteration and for an Gujarati input method to be included in Narayam. The continued collaboration with RedHat engineers ensures that our work benefits both MediaWiki and RedHat/Fedora. We do realise that there is still a lot to do and it is not only documentation. Additional work was done on the “visual on-screen keyboard” that was started at the previous hackathon in Pune, it still needs more testing and design work.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant

MediaWiki 1.19 deployment to Wikimedia sites: Test it before it breaks

The logo of MediaWiki (a yellow sunflower surrounded by two pairs of blue square brackets) with gradients symbolizing its coming to age for the next version

Wikimedia sites will gradually be upgraded to version 1.19 of MediaWiki over the second half of February 2012.

This article is available in other languages on mediawiki.org.


Wikimedia engineers are putting the final touches to the latest version of MediaWiki, the software that powers Wikipedia and its sister sites. This version, labeled “1.19wmf1″, will be deployed to Wikimedia sites in stages, starting next week.

We’ve recently set up a Beta cluster, replicating a selection of Wikimedia wikis, where Wikimedians have tested the new version and checked that it worked reasonably well with their local wiki’s specific customizations.

Things are looking good, and the current plan is to run the deployment in five stages between February 15th and March 1st, 2012. The schedule may change based on unexpected issues, so you should refer to the MediaWiki 1.19 roadmap for an up-to-date schedule of when your wiki will be affected. (more…)

Insights from mobile user experience research

Mobile Wikipedia readers in Brazil

As part of our commitment to provide free knowledge to everyone, the foundation has been redesigning our mobile platform (m.wikipedia.org and mobile.wikipedia.org) to enhance the reading experience and allow editing.  As a first step towards the redesign of the mobile gateway to better meet the needs of our users in the Global South, we conducted user experience research in India and Brazil among current and future users of Wikipedia mobile last summer.  We also carried out user experience research in the US to have a comparison with a mobile market which is more mature in terms of smartphone and 3G penetration, and has a more widespread adoption of tablets.

Our research in India and Brazil brought forth the following three opportunities with the greatest perceived impact for the mobile platform:

  1. Improving our search:  Our research revealed that there was a need to provide search suggestions, autocomplete, autocorrect and other tools that ease typing and search burdens on mobile devices; support search in all language Wikipedias as well as allowing users to chose and switch between languages; incorporate transliteration tools for languages with fonts and characters that have poor mobile support; support and even enhance users’ existing habits to use Google to reach Wikipedia articles; and enable users to search within a Wikipedia page. We are happy to report that drawing from the research our mobile team has already implemented some of these opportunities like full page search, autocomplete  and inter-wiki links into our mobile beta site.
  2. Optimizing our reading experience for mobile devices and generalized use.  Through our research, especially in India, we found that we were not redirecting a large breadth of devices in use to our mobile site. The mobile team quickly fixed this issue with the adoption of the open source library tera-WURFL for detecting mobile devices.  After speaking with respondents in India and Brazil, we found that there was a desire among users to modify or set one-time preferences for the display of images, the font size, and any element that affects page loading time and size. Similarly, there is an opportunity for allowing  preferences for language and navigation; the ability to watch or bookmark articles; or save content offline; offer content in more digestible pieces, or with quicker access (i.e. preview or easy access to the first paragraph, or a new “mobile summary”); search offline, i.e., while in transit or without a data plan; and generally follow expectations set by mobile web interactions and standards.  Some of these recommendations have been incorporated into our mobile product strategy.  Through this research we felt it was crucial to offer both an official iOS and Android app (which was officially released in January) that offers at minimum a simple and easy search and reading experience.
  3. Using the mobile platform to both increase user engagement and awareness of features on Wikipedia as well as providing new opportunities for participation. The mobile site and potential apps provide many new pathways for both engagement, participation, and contribution.  At present, the mobile site can be used to build awareness around existing features on the site that current users are blind to (i.e. watchlists, accounts, editing, inter-language links, history); to provide features that make opening a Wikipedia account worth having, something that the majority of our participants do not currently see any reason to have; increase visibility of local language Wikipedias, especially in India since many English readers were not aware of the existence of Indic Wikipedias; prompt users to download an official app when possible; and interface with other web content on mobile devices (Google, news, entertainment, and sports content, for example “Wikitap”).  The contributions that showed the highest potential for adoption were adding photographs, “flagging” or “marking” something that needs to be edited, removing or marking vandalism, adding links, adding location or geodata, and potentially making small typing or formatting edits.
  4. Mobile Editing. And finally, the mobile site can support the editing practice of existing editors by first offering those features in a mobile friendly format which are currently in high use on the site.  Those with the highest demand and potential are the “recent changes” page, which is consumed like an update feed or email; accessing watch lists; making reverts, especially with respect to vandalism; logging in and accessing account and user pages; and serving discussion pages and article histories.

 

If you are interested in reading about our research in India and Brazil in detail, we have compiled the insights in a report which is available in PDF and wiki format. You can also watch video highlights from the interviews and check out some photographs from the field work in India and Brazil.

Mani Pande, Head of Global Development Research

Scaling media storage at Wikimedia with Swift

Wikipedia is huge. Almost four million articles in English alone — but as they say, a picture is worth a thousand words (actually, it’s usually closer to several million). In terms of raw bits on disk, the largest project is clearly the Wikimedia Commons, the free media repository integrated with all of the Wikimedia projects. In addition, many projects allow their own local media uploads. As a result, across all wikis, Wikimedia stores millions of images, sounds, and other media files.

We’ve been able to manage the load for quite a while by using two servers with lots of local storage — (10 and 30TB), but we’re pushing against that limit and we would like a more fault-tolerant option. So, for the last few months, we have been working on replacing the infrastructure that holds all that data.

Our goal is to have a storage system that will allow us to scale more easily, and accept large collections of media from projects like Wiki Loves Monuments, and the U.S. National Archives’ donation of their collection of photographs by Ansel Adams.

After evaluating a number of options, we chose to pursue OpenStack Swift. Swift is a distributed object storage system with automatic replication, so that if one host has problems the requseted file is retrieved from another server with no interruption of service. Aside from meeting our needs around performance, reliability, and scalability, it is a good fit considering we are also using OpenStack products for Wikimedia Labs.

We have just completed the first milestone along the road to replacing our existing storage systems with Swift: all image thumbnails (scaled images such as a 320px version of a picture) are now stored on Swift. Our current production Swift cluster is made up of 4 back-end storage nodes with 22TB each and 2 front-end proxy nodes that handle user web requests. This new architecture provides us the scalability and reliability we need going forward.

Over the next few months we will build a second Swift cluster in our Virginia data center, then work on migrating all of the original media over to Swift as well. For more detail on the implementation and plan for Swift, you can read up on the documentation on Wikitech, ask questions in the comments below, or come and visit us in #wikimedia-tech on Freenode in IRC.

Ben Hartshorne
Operations
Wikimedia Foundation

Wikimedia engineering January 2012 report

Major news in January include:

(more…)

Techies learn, make, win at Foundation’s first San Francisco hackathon

Participants at the San Francisco hackathon in 2012

Participants at the San Francisco hackathon in January 2012

In January, 92 participants gathered in San Francisco to learn about Wikimedia technology and to build things in our first Bay Area hackathon.

After a kickoff speech by Foundation VP of Engineering Erik Möller (video), we led tutorials on the MediaWiki web API, customizing wikis with JavaScript user scripts and Gadgets, and building the Wikipedia Android app.  (We recorded each training; click those links for how-to guides and videos.)  We asked the participants to self-organize into teams and work on projects.  After their demonstration showcase, judges awarded a few prizes to the best demos.

(more…)

Free software community shares lessons learned in “Open Advice” book

Open Advice book cover

The "Open Advice" book is available for free download, or purchase as print from lulu.com.

The Open Advice book, a collection of essays, stories and lessons learned by members of the Free Software community, is out!

The book was just announced at FOSDEM, the Free and Open Source Software Developers’ European Meeting, in Brussels over the week-end.

About 50 authors from many different projects of the free software community were brought together by Lydia Pintscher, the book’s editor, who started the project in early 2011.

A year and 380 pages later, the book is now available, and tries to provide an answer to the question: What’s the key thing you would have liked to know when you started contributing?

Authors answer that question for many topics, ranging from “Writing patches” to “Documentation for Novices”, to business models, conferences, translation, design, and more.

I contributed “Learn from your users”, a chapter on user experience and usability testing. You’ll also recognize other names from the Wikimedia community, like Evan Prodromou, Markus Krötzsch and Felipe Ortega.

You can learn more about the book and the authors on the book’s website.

All the content of the book is released under the same license as Wikipedia, the Creative Commons Attribution Share-Alike license.

Check it out! You can download the book for free as a PDF file, order a print from lulu.com if you prefer paper books, or fork the text on GitHub.

I hope you’ll like the book, and it’ll prove useful, whether you’re new to the world of software, or you’re a seasoned contributor already.

Guillaume Paumier
Technical Communications Manager

Getting ready for when the freeze is done

When you look at the “sprint backlog” in mingle (guest, guest), you may notice that even though we have been slowed down because of the slush, the feature freeze because of the imminent MediaWiki release, we are not sitting on our hands. Documentation, testing, code review and outreach is on our agenda.

Because of the way we are planning, it is apparent how much code review actually gets done. This sprint we added a review of the ArticleFeedback extension for its internationalization and localization aspects. This is a logical development considering that, with 280+ languages, we are not developing for one language. Our objective for this job is: “As a user I can use the functionality of the ArticleFeedbackv5 so that nothing looks odd in my language from an internationalization and localization perspective”. Reviews like this have been performed informally in the past by translatewiki.net staff. This review, however, will be done during Wikimedia hours and reported through Wikimedia channels.

One old open bug is about EasyTimeline.  It started its life in 2005 and it is finally getting the attention it deserves. The bug explains the lack of support for languages like Arabic, Hebrew and Farsi that are written from right to left. The software has Ploticus as a dependency and for a long time the waiting was for a version of this software that does support RtL languages. We are not waiting any longer and you can read in our story 230 about the complexities involved.

You could say that implementing a translation memory for page translation is a bit more adventurous; it is however debatable if that functionality is new; a translation memory has for a long time been functional at translatewiki.net. It is also very much a feature that makes people more productive. Our team has always had the goal of making life easy and productive for our editors and translators.

The “grammar” functionality for JavaScript is part and parcel of the i18n tooling for our developers. It was not ready before the “slush” and it does make our lives difficult not having it available in the code. When you are building tests for “gender” and “plural”, it is so obvious to create them for “grammar” as well. In this sprint, “grammar” will be included in the code for all these good reasons.

This is the first time that there is a story for outreach. We are reaching out to all the Wikipedia language communities to have their own language support team. It will make a difference when all our language communities have been asked to provide their expertise to us. We already have found that many people show an interest and issues do get raised as a result.

Thanks,
Gerard Meijssen
Internationalization / Localization outreach consultant