Wikimedia blog

News from inside the Wikimedia Foundation.org

Technology

News and information from the Wikimedia Foundation’s Technology department (RSS feed).

Opening our operations with Wikimedia Labs

For the past year and a half we’ve been working on a project named Wikimedia Labs, which enables us to invite our community to contribute to how our sites are run. Labs is a cloud computing environment using OpenStack for development, testing and deployment of Wikimedia’s infrastructure as a whole, enabling us to treat our infrastructure as an open source software project.

The problems we’re solving

When Wikipedia and its sister projects started, volunteers had root level access on our infrastructure. They were the only roots and most of the infrastructure they built is still in use today. Our lenient access policy made us flexible, so changes could happen quickly. Also, the sites were smaller, had far fewer users, and large, fundamental changes could be made in production.

Growth has made us less willing to give out root access to volunteers. Because of the size of our sites, downtime is less acceptable. But having fewer volunteers means we have less ideas, and due to that, our ability to make changes quickly is decreased. We haven’t had a new volunteer root in years. We haven’t even had a new volunteer with shell access. Engaging volunteers and enabling them to easily contribute is a wider problem as well.

Our software development community scales with volunteers. Unfortunately, operations doesn’t scale in a similar way right now. We’re limited to the staff operations engineers we currently have. The staff is great, but the fact that operations can’t scale to meet the needs of a large growth of developers means that operations is a bottleneck. Furthermore, our access policy prevents volunteer developers from learning how our infrastructure works.

This leads to a situation where our staff developers and volunteer developers can’t easily collaborate. Our volunteers also have no way of appropriately testing their changes, since our infrastructure is complex and difficult to replicate. This means it’s harder to take contributions, which further slows the pace of changes on our sites.

(more…)

First of many MediaWiki 1.20 deployments have begun

The logo of MediaWiki (a yellow sunflower surrounded by two pairs of blue square brackets) with gradients symbolizing its coming to age for the next version

Wikimedia sites will gradually be upgraded to version 1.20 of MediaWiki in April 2012.

Wikimedia engineers have finished up the latest version of MediaWiki, the software that powers Wikipedia and its sister sites. We have begun deploying this version, labeled “1.20wmf1,” to all Wikimedia sites in stages. We started on April 10th and will continue until April 25th.

Yes, we only deployed MediaWiki 1.19 a few weeks ago. This new update is part of our effort to get you fixes and improvements much more regularly (a reason we recently switched from Subversion to Git).

We plan to deploy the latest software every two weeks. Rather than calling each version of the deployed software 1.20, 1.21, 1.22, etc every two weeks, we’ll be using a variation of the “1.20″ moniker for the next few months.

We’re decoupling our deployment process (to Wikimedia sites like Wikipedia) and our release process of standalone MediaWiki installer for use on third-party sites. We plan to have MediaWiki 1.20wmf1 and 1.20wmf2 in April, 1.20wmf3 and 1.20wmf4 in May, etc., until we actually release a new MediaWiki 1.20 installer this fall (probably October).

Only after this point will we start referring to deployments as “1.21″ deployments. The cycle will repeat approximately every six months, with Wikimedia deployments every two weeks, and installer releases every six months.

We’ve already tried out the 1.20wmf1 version on a test wiki and on mediawiki.org, and things are looking good. But the schedule may change based on unexpected issues, so you should refer to the MediaWiki 1.20 roadmap for an up-to-date schedule of when your wiki will be affected.

What’s new

This is a fairly small set of changes, compared to the March deployment of MediaWiki 1.19. This is intended to minimize disruption and possible issues, and make it easier to identify the cause of problems, since the possibly problematic code will be much more recent.

The biggest thing you’ll notice is the new diff style (example on mediawiki.org), designed to improve the experience of color-blind and partially sighted visitors.

More polish you’ll notice: There is a new option on Special:Prefixindex and Special:Allpages to hide redirects (addressing bug 30963). New edit emails for watched pages always provide a link to the edit which triggered the mail (fixing bug 32210).  And “Creating” is now given in the page title instead of “Editing” when you are creating a page (fixing bug 22870).

And, of course, developers have improved the software “under the hood” in many ways. A list of all changes is available in the draft release notes.

Snags and glitches?

If, despite our efforts, you encounter issues due to the upgrade, we’ll try and fix them as soon as we can. Get an account and report issues in our bug tracker, which is where we look for reports of problems. And the faster you tell us about problems, the faster we can address them.

Thanks!

Sumana Harihareswara, Volunteer development coordinator
Rob Lanphier, Director of Platform Engineering
Images contained in this blog post are available under CC-BY-SA

Download the text of the entire English Wikipedia

If you’d like to read Wikipedia in an airplane (of the offline variety) or in an area with no or limited connectivity, or install it in a university, or just to have it handy in case of a zombie apocalypse, you can now download a full text copy of the English Wikipedia (from January 2012) in the convenient OpenZIM format, which was specifically developed for sharing wiki content.

OpenZIM files can be read in multiple reader applications, the most popular of which is Kiwix, available for Mac, Windows, Linux, and Sugar.

Start your BitTorrent client and grab a copy of the 9.7GB file (.torrent link, other download options). You can also download content packages directly from within Kiwix using its library feature, including content from sister projects like Wiktionary and Wikisource, as well as non-Wikimedia content.

While the ZIM file doesn’t include images (that would blow it up to ~100GB for thumbnail-sized images), it does come with all the lists, tables, citations, and even mathematical formulas included in the online version.

Wikimedia content has always been made available under free and open licensing terms in raw copies, but ZIM content packages offer a higher level of convenience for the end user.

Please note that this OpenZIM file was prepared by Emmanuel Engelhart, the developer of Kiwix, and feedback should be directed to him (contact at kiwix dot org) or submitted through the Kiwix feedback system.

New Wikipedia app for iOS (and an update for our Android App)

We launched the official Wikipedia application for Android almost two months ago and the response has been tremendous. We’ve had ~2.25 million installations and ~5000 ratings (with a 4.4/5.0 average). Preliminary reports also indicate about 23 million page views per month via the app. In short, it has been doing pretty well!

iOS App Launch

Today we’re excited to announce a new version of the Wikipedia app for iOS. This has all the features from our Android app, styled to be consistent with iOS:

  • Search suggestions
  • Full text search
  • “Did you mean?” results
  • Saving pages for offline viewing
  • Share pages via Twitter, Facebook
  • Save pages to Read It Later
  • Read current page in other languages
  • Map integration to view nearby articles
  • View location of current article + nearby articles in a map
  • Set the default language
  • Navigation history features
  • … and some more!

This release is based on the same code that powers our Android application – an advantage of basing our app on Apache Cordova (previously PhoneGap). This enables us to reach the most number of platforms with the least amount of code. Fully embracing HTML5, CSS3, & Javascript commits us to the open Web technologies of the future.
(more…)

Wikimedia engineering March 2012 report

Major news in March include:

(more…)

The Wikipedia data revolution

The second phase of Wikidata will aim to augment the infoboxes which are currently widely used on Wikipedia to display structured data

Wikimedia Deutschland, the German chapter of the Wikimedia movement, and the Wikimedia Foundation are proud to announce Wikidata, a collaboratively edited database of the world’s knowledge and the first new Wikimedia project since 2006.

Wikidata will support the more than 280 language editions of Wikipedia with one common source of structured data that can be used in all articles of the free encyclopedia. Wikidata is expected to lead to a higher consistency and quality within Wikipedia articles, as well as increased availability of information in the smaller language editions. At the same time, Wikidata will decrease the maintenance effort for the 90,000 volunteers editing Wikipedia.

“Wikidata is ground-breaking. It is the largest technical project ever undertaken by one of the 40 international Wikimedia chapters,” said Pavel Richter, CEO of Wikimedia Deutschland. ”Wikimedia Deutschland is thrilled and dedicated to significantly improving the data management of the world’s largest encyclopedia with this project.”

In addition to the Wikimedia projects, the data is expected to be beneficial for numerous external applications, especially for annotating and connecting data in the sciences, in government, and for applications using data in very different ways. The data will be published under a free Creative Commons license.

The initial development of Wikidata is being funded with a donation of 1.3 million Euros, half of which comes from the Allen Institute for Artificial Intelligence [ai]². The Institute supports long-range research activities that have the potential to accelerate progress in artificial intelligence. It was established in 2010 by Microsoft co-founder Paul G. Allen, whose contributions to philanthropy and the advancement of science and technology span more than 25 years.

“Wikidata is a simple and smart idea, and an ingenious next step in the evolution of Wikipedia,” said Dr. Mark Greaves, Vice President of the Allen Institute for Artifical Intelligence. “It will transform the way that encyclopedia data is published, made available, and used by a global audience. Wikidata will build on semantic technology that we have long supported, will accelerate the pace of scientific discovery, and will create an extraordinary new data resource for the world.”

One quarter of Wikidata’s initial funding has been donated by the Gordon and Betty Moore Foundation through its Science Program. ”It is important for science,” said Chris Mentzel, Gordon and Betty Moore Foundation science program officer. “Wikidata will both provide an important data service on top of Wikipedia, and also be an easy-to-use, downloadable software tool for researchers, to help them manage and gain value from the increasing volume and complexity of scientific data.”

Google, Inc. has provided another quarter of Wikidata’s funding. ”Google’s mission is to make the world’s information universally accessible and useful,” said Chris DiBona, Director, Open Source at Google. ”We’re therefore pleased to participate in the Wikidata project which we hope will make significant amounts of structured data available to all.”

Wikidata will be developed in three phases. The first phase is expected to be finished by August 2012. It will centralize links between the different language versions of Wikipedia. In the second phase, editors will be able to add and use data in Wikidata. The results of the second phase are scheduled to be released in December 2012. The third and final phase will allow for the automatic creation of lists and charts based on the data in Wikidata. This will close the initial development process for Wikidata.

The team of eight developers is being led by Dr. Denny Vrandečić. Formerly of the Karlsruhe Institute of Technology, he works with Wikimedia Deutschland and is, together with Dr. Markus Krötzsch, of the University of Oxford, co-founder of the Semantic MediaWiki project, which has pursued the goals of Wikidata for the last few years. The proposal for Wikidata was developed with financial support by the EU project RENDER, which also involves Wikimedia Deutschland as a use-case partner.

Wikimedia Deutschland will perform the initial development, and plans to hand over operation and maintenance of the project to the Wikimedia Foundation by March 2013.

Matthew Roth
Global Communications Manager 

Wikipedia Mobile gets a face lift

A growing number of visitors access the mobile site of Wikipedia and it is an area the engineering team is keen to improve. To do this, we are offering a more functional and polished experience adapted for mobile users, who operate in a much more confined world compared to those on the desktop.

This week we pushed several new and updated design changes to our beta. We hope these changes will provide a more professional look and a better experience for you. These include changes to the footer, a cleaner design for revealing and hiding sections, and a revamped full-screen search experience. The mechanism for toggling between desktop and mobile has also moved from the footer to the top navigation menu to the left of search to allow users to switch more effortlessly.

References can now be read in place

Full screen search

In addition to this we have also pushed an experimental feature which makes it easier to refer to references on articles without having to plunge to the bottom of the page. Now clicking on a reference will load an overlay which readers can consult without losing their place in the article.

We are keen to gather feedback to stabilise these additions and make these changes available by default to a much larger audience. In particular and as always, we are interested in any device-specific issues being brought to our attention as well as feedback on the new design. Let us know how you find the experience – good and bad and also the quirks that you discover.

We are also experimenting with animations when revealing references and would appreciate thoughts from the community on which is felt to work best. By default, references are revealed by a fade in/out effect but we would appreciate thoughts on whether a slide animation or no animation would be preferable.

Opt in to our beta and try them out today. We look forward to your feedback which can be provided either here or by your involvement in the design process.

– Jon Robson, Software Developer Mobile

Helping readers improve Wikipedia: First results from Article Feedback v5

Figure 1. One of the feedback forms tested in the AFTv5 experiments (Option 1).

 

The Wikimedia Foundation, in collaboration with editors of the English Wikipedia, is developing a tool to enable readers to contribute productively to building the encyclopedia. To that end, we started development of a new version of the Article Feedback Tool (known as AFTv5) in October 2011. The original version of the tool, which allows readers to rate articles based on a star system, launched in 2010. The new version invites readers to write comments that might help editors improve Wikipedia articles. We hope that this tool will contribute to the Wikimedia movement’s strategic goals of increasing participation and improving quality.

Testing new feedback forms

On December 22, 2011, we started testing three different designs for the AFTv5 feedback forms:

  • Option 1: Did you find what you were looking for? (shown above)
  • Option 2: Make a suggestion, give praise, report a problem or ask a question
  • Option 3: Rate this article

The purpose of this first experiment was to measure the type, usefulness and volume of feedback posted with these feedback forms. For example, does asking a reader to describe what they were looking for (option 1) provide more actionable feedback than asking them to make a suggestion (option 2)?

We enabled AFTv5 on a small, randomly selected set (0.6%) of articles on the English Wikipedia, as well as a second set of high-traffic or semi-protected articles. A feedback form, randomly selected from the above three options, was placed at the bottom of each page. The feedback form was also accessible via a link docked on the bottom right corner of the page.  The resulting comments were then analyzed along a number of dimensions.

(more…)

A profile in free collaboration

Wikimedia Foundation operations engineer Ryan Lane. Photo by Victor Grigas, CC-BY-SA 3.0.

Most top websites have thousands of software developers on staff, creating new features and keeping the site running securely. The Wikimedia Foundation has about forty. That’s pretty amazing, considering Wikipedia is the fifth most popular web property in the world. So, what’s our secret?

Well, we don’t have any secrets.

We make everything free, in every sense of the word. The technology we operate has been built by thousands of people around the world who collaborate freely and build upon each other’s contributions. Every article, every picture, every piece of code is free for anyone to use, reuse, copy, distribute and improve.

“Other tech companies wouldn’t share their installation, configuration or system documentation,” said Ryan Lane, an operations engineer at the Wikimedia Foundation. These proprietary data are the competitive advantage most websites have over their peers and they guard them dearly. “Wikipedia documents and shares all of that.”

“No other organization of our scope would dream of being this open,” he added. “It is our fundamental organizing principle.”

Lane understands what it means to operate in secret. Before coming to the Foundation, he spent six years working on classified projects for the U.S. government at the Naval Oceanographic Office (NAVO). He was forbidden from speaking about his work.

“In the government, I wouldn’t be allowed to talk about any of it. Not being able to talk about anything I do is really painful,” he said of working in a closed environment. “The ability to share everything is very freeing.”

Lane hails from New Orleans, where he studied computer science at the University of New Orleans. He has been with the Foundation for nearly two years, managing web infrastructure to ensure that Wikimedia projects become more reliable and efficient. For Lane, working in an open-source and transparent environment is what makes his work meaningful.

“In computer science, it’s very difficult not to be able to share your knowledge with other people. The way I learned most of the things I know is because people shared their expertise with me,” he noted.

Because the Wikimedia sites are so open, according to Lane, it’s much easier to collaborate with the community. In addition to the roughly 40 software developers on staff at the Foundation, there are more than 200 regular volunteer developers improving MediaWiki software, the backbone of Wikipedia and thousands of other wikis.

Lane manages Wikimedia Labs, a project that was created to allow volunteers to make contributions to MediaWiki development, tools and analytics. Working in an open environment means Lane can not only talk about a problem, he can give a total stranger a replica of our configuration system so they can help change and improve our operations infrastructure.

At a recent hackathon in San Francisco, Lane said, a programmer who had never previously worked within Wikimedia’s environment fixed a bug in the logging infrastructure behind our https site. His code was good and Lane pushed it to production. “It was running live within a few hours,” he said.

According to Lane, in a closed environment everyone has to do everything themselves, which requires more people on the whole for the organization. Or you have to pay “a lot of money to get support to come and help you, and the support is generally subpar in comparison.”

When asked whether being so transparent was a security liability, Lane argued the value of open-source was more significant than the risk of someone hacking the projects.

“It might be a little crazy to share our server configuration,” Lane admitted. “To a point, it does make us more vulnerable, but I think there’s enough benefit in it to outweigh the worries about the vulnerabilities.”

(For more information about Lane’s work with the Wikimedia projects, read his blog here.)

Reporting and story by Elaine Mao and Jordan Hu
Communications Department Interns

Project ideas, students, and mentors wanted to improve Wikimedia tech this summer

Google Summer of Code 2012

Google Summer of Code 2012

For the seventh year in a row, Wikimedia Foundation is participating in the Google Summer of Code program. Google Summer of Code (GSoC) is a program where Google pays summer students USD 5000 each to code for open source projects for three months (read more).

We hope 2012′s students will develop useful chunks of MediaWiki, help us get their code shipped, and fall in love with our community such that they stay with us for years to come.

This year’s project ideas include improvements to CentralNotice, taxobox editing, search, translation tools, and more.  Interested?

University, community college, and graduate students around the world are eligible to apply to Google Summer of Code. You don’t need to be a computer science or IT major, and you can work from home.

MediaWiki logo

MediaWiki is the Wikimedia Foundation's key open source project, powering Wikipedia and our other sites.

We are looking for students who already know some PHP. We also strongly prefer for you to have some experience working with Linux, Apache, and MySQL environments, and with the Git version control system. If you haven’t contributed to MediaWiki before, How to become a MediaWiki hacker is a good place to start; we will strongly prefer candidates who submit patches before the April 6th GSoC application deadline.

If you’d like to participate, check out the timeline. Make sure you are available full-time from 21 May till 20 August 2012, and have a little free time from 23 April till 20 May for ramp-up. Please read our wiki page and start talking with us on IRC in #mediawiki on Freenode about a possible project.  Then you’ll write a proposal and submit it via the official GSoC website. The deadline for you to submit a project proposal is April 6th, but we encourage you to start early and talk with us about your idea first.

We’re also seeking experienced MediaWiki developers anywhere in the world to help select and mentor student projects. We’ll take you even if you live in the southern hemisphere and it’s not summer for you. :-) You’ll need to be available online consistently so you can respond to student questions between now and late August. As Brion Vibber put it, if you “are knowledgeable about MediaWiki — not necessarily knowing every piece of it, but knowing where to look so you can help the students help themselves” then please consider helping out.

I’m administering our participation in GSoC. So I am encouraging students to apply, getting project ideas, and managing the application process overall. I look forward to seeing students discover the joy of collaborative work that improves the Wikimedia experience for millions of users. Help us spread the word.

Sumana Harihareswara
Volunteer Development Coordinator, Wikimedia Foundation
MediaWiki Coordinator, GSoC 2012