Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Operations

A profile in free collaboration

Wikimedia Foundation operations engineer Ryan Lane. Photo by Victor Grigas, CC-BY-SA 3.0.

Most top websites have thousands of software developers on staff, creating new features and keeping the site running securely. The Wikimedia Foundation has about forty. That’s pretty amazing, considering Wikipedia is the fifth most popular web property in the world. So, what’s our secret?

Well, we don’t have any secrets.

We make everything free, in every sense of the word. The technology we operate has been built by thousands of people around the world who collaborate freely and build upon each other’s contributions. Every article, every picture, every piece of code is free for anyone to use, reuse, copy, distribute and improve.

“Other tech companies wouldn’t share their installation, configuration or system documentation,” said Ryan Lane, an operations engineer at the Wikimedia Foundation. These proprietary data are the competitive advantage most websites have over their peers and they guard them dearly. “Wikipedia documents and shares all of that.”

“No other organization of our scope would dream of being this open,” he added. “It is our fundamental organizing principle.”

Lane understands what it means to operate in secret. Before coming to the Foundation, he spent six years working on classified projects for the U.S. government at the Naval Oceanographic Office (NAVO). He was forbidden from speaking about his work.

“In the government, I wouldn’t be allowed to talk about any of it. Not being able to talk about anything I do is really painful,” he said of working in a closed environment. “The ability to share everything is very freeing.”

Lane hails from New Orleans, where he studied computer science at the University of New Orleans. He has been with the Foundation for nearly two years, managing web infrastructure to ensure that Wikimedia projects become more reliable and efficient. For Lane, working in an open-source and transparent environment is what makes his work meaningful.

“In computer science, it’s very difficult not to be able to share your knowledge with other people. The way I learned most of the things I know is because people shared their expertise with me,” he noted.

Because the Wikimedia sites are so open, according to Lane, it’s much easier to collaborate with the community. In addition to the roughly 40 software developers on staff at the Foundation, there are more than 200 regular volunteer developers improving MediaWiki software, the backbone of Wikipedia and thousands of other wikis.

Lane manages Wikimedia Labs, a project that was created to allow volunteers to make contributions to MediaWiki development, tools and analytics. Working in an open environment means Lane can not only talk about a problem, he can give a total stranger a replica of our configuration system so they can help change and improve our operations infrastructure.

At a recent hackathon in San Francisco, Lane said, a programmer who had never previously worked within Wikimedia’s environment fixed a bug in the logging infrastructure behind our https site. His code was good and Lane pushed it to production. “It was running live within a few hours,” he said.

According to Lane, in a closed environment everyone has to do everything themselves, which requires more people on the whole for the organization. Or you have to pay “a lot of money to get support to come and help you, and the support is generally subpar in comparison.”

When asked whether being so transparent was a security liability, Lane argued the value of open-source was more significant than the risk of someone hacking the projects.

“It might be a little crazy to share our server configuration,” Lane admitted. “To a point, it does make us more vulnerable, but I think there’s enough benefit in it to outweigh the worries about the vulnerabilities.”

(For more information about Lane’s work with the Wikimedia projects, read his blog here.)

Reporting and story by Elaine Mao and Jordan Hu
Communications Department Interns

Transfer of Wikipedia sites from GoDaddy complete

After months of deliberation and a complicated transfer, the Wikimedia Foundation domain portfolio has been successfully transferred from GoDaddy to MarkMonitor. The portfolio transfer was formally completed on Friday, March 9th, 2012. The transfers were done seamlessly and our sites did not experience any interruption of service or other issues during the procedure.

As the provider of the 5th most visited web properties in the world, the Foundation cares deeply about who handles our domain names. We had been deliberating a move from GoDaddy for some time — our legal department felt the company was not the best fit for our domain needs — and we began actively seeking other domain management providers in December 2011. GoDaddy’s initial support of the Stop Online Piracy Act (SOPA), the controversial anti-piracy legislation in the U.S. House of Representatives, reaffirmed our decision to end the relationship.

After exploring numerous alternatives, the Foundation’s legal team decided that MarkMonitor could best provide the comprehensive services that we needed. MarkMonitor is a U.S.-based registrar with an office in San Francisco and has substantial experience managing other high-traffic domains. The company will help the Foundation consolidate and centralize management of all of its domains, will provide services needed to manage a global domain portfolio and will better protect our domains with additional security features.

The Foundation was already utilizing MarkMonitor’s brand protection services and we found their dedicated customer support team’s work to be exceptional. The use of their domain management services ensures greater efficiency in the handling of the Foundation’s trademark and domain name portfolios.

We have been very impressed with MarkMonitor to this point and we are confident we have placed the Wikimedia domain portfolio in competent hands.

Michelle Paulson
Legal Counsel

Scaling media storage at Wikimedia with Swift

Wikipedia is huge. Almost four million articles in English alone — but as they say, a picture is worth a thousand words (actually, it’s usually closer to several million). In terms of raw bits on disk, the largest project is clearly the Wikimedia Commons, the free media repository integrated with all of the Wikimedia projects. In addition, many projects allow their own local media uploads. As a result, across all wikis, Wikimedia stores millions of images, sounds, and other media files.

We’ve been able to manage the load for quite a while by using two servers with lots of local storage — (10 and 30TB), but we’re pushing against that limit and we would like a more fault-tolerant option. So, for the last few months, we have been working on replacing the infrastructure that holds all that data.

Our goal is to have a storage system that will allow us to scale more easily, and accept large collections of media from projects like Wiki Loves Monuments, and the U.S. National Archives’ donation of their collection of photographs by Ansel Adams.

After evaluating a number of options, we chose to pursue OpenStack Swift. Swift is a distributed object storage system with automatic replication, so that if one host has problems the requseted file is retrieved from another server with no interruption of service. Aside from meeting our needs around performance, reliability, and scalability, it is a good fit considering we are also using OpenStack products for Wikimedia Labs.

We have just completed the first milestone along the road to replacing our existing storage systems with Swift: all image thumbnails (scaled images such as a 320px version of a picture) are now stored on Swift. Our current production Swift cluster is made up of 4 back-end storage nodes with 22TB each and 2 front-end proxy nodes that handle user web requests. This new architecture provides us the scalability and reliability we need going forward.

Over the next few months we will build a second Swift cluster in our Virginia data center, then work on migrating all of the original media over to Swift as well. For more detail on the implementation and plan for Swift, you can read up on the documentation on Wikitech, ask questions in the comments below, or come and visit us in #wikimedia-tech on Freenode in IRC.

Ben Hartshorne
Operations
Wikimedia Foundation

Techies learn, make, win at Foundation’s first San Francisco hackathon

Participants at the San Francisco hackathon in 2012

Participants at the San Francisco hackathon in January 2012

In January, 92 participants gathered in San Francisco to learn about Wikimedia technology and to build things in our first Bay Area hackathon.

After a kickoff speech by Foundation VP of Engineering Erik Möller (video), we led tutorials on the MediaWiki web API, customizing wikis with JavaScript user scripts and Gadgets, and building the Wikipedia Android app.  (We recorded each training; click those links for how-to guides and videos.)  We asked the participants to self-organize into teams and work on projects.  After their demonstration showcase, judges awarded a few prizes to the best demos.

(more…)

Nobody notices when it’s not broken: New database servers deployed

The Technical Operations team has just completed behind-the-scenes work that will likely never be noticed by our readers.

Our External Storage databases hold the text for every version of every wiki page; they have slowly grown over the life of Wikipedia and its sister projects. Ten years is a lifetime on the Internet, and the incremental changes that were made to our external storage system over that period, though appropriate at the time they were made, resulted in a setup that was a challenge to maintain and which was becoming unreliable.

Graph of query durationWe spent a few weeks analyzing all the various servers across which the page text data was spread, in order to gather it all together onto a single host. From there, it could be replicated onto newer, more reliable and higher performance hardware. Along the way, we found and fixed a number of inconsistencies to make the dataset more regular.

The deployment of the new hardware lasted a few days (as we moved things piece by piece) and was finished this past Monday with no fanfare. There was a brief (about 10 minute) period during which articles were unable to be edited while we switched writes to the new hardware. The end result is a barrage of small improvements, all of which together make for a happy TechOps team:

  • average query duration has dropped from about 15ms to around 8ms and the worst case from 576ms down to 60ms;
  • replication and failover processes are now well known and standardized;
  • total hardware used has dropped from around 30 servers to 8, now in two locations;
  • hosts no longer double up as web servers and database servers for text; dedicated servers are used for the database.

It’s a small victory in the battle against entropy, but an important prerequisite for carrying out our mission of providing unfettered and reliable access to the sum of all knowledge.

Ben Hartshorne, Operations Engineer

Tech meetup moves Wikimedia infrastructure forward

Earlier this month, about thirty MediaWiki developers and interested technologists gathered in New Orleans to learn and to work on Wikimedia’s technical infrastructure.  We made broad progress on the infrastructure of innovation at Wikimedia (notes).  Specifically:

NOLA Hackathon 16

Tim Starling and DJ Bauch driving towards greater media file storage system independence and robustness

  • We are now much closer to officially opening the doors to Wikimedia Labs and giving far more people the ability to contribute to MediaWiki without having to set up and maintain their own development environments at home.  Wikimedia Labs will provide hosted, virtualized test and development sandboxes for new and experienced programmers and systems administrators.  Many developers got beta Labs accounts, we tested at a larger scale, and we fixed several bugs.
  • Developers agreed to create a file backend abstraction layer to enable large-scale MediaWiki installations to use one of several storage systems to contain big collections of big media files.  (Wikimedia plans on using Swift, which is open source.) Microsoft’s Ben Lobaugh and SAIC’s DJ Bauch collaborated towards improving MediaWiki’s performance on Microsoft technologies as well.  Developers made architectural decisions, refactored some existing code, and improved documentation and tests for the SwiftMedia extension to MediaWiki.
  • Chad Horohoe teaching developers about unit testing

    Chad Horohoe teaching developers unit testing

    We now have a continuous integration server up and running.  This will continuously run tests checking on the latest new features and bugfixes that developers write, resulting in fewer bugs and faster development. Developers will need to write tests to reap the benefits, so Chad Horohoe taught a test-writing workshop.

  • Max Semenik finished and demonstrated the first version of his API Query Sandbox.  This allows software developers anywhere to experiment with ways to automatically get data from Wikipedia or other sites that run MediaWiki, thus enabling wider and deeper reuse of Wikimedia content.
  • Operations folks continued the Puppetization of our infrastructure: they completely reworked Varnish management in Puppet, and worked on Puppet configurations for SwiftMedia testing. This configuration management work will ensure that ops can move faster and more confidently in building and maintaining Wikimedia infrastructure. And Canonical’s Mark Mims and Kapil Thangavelu worked on improving methods for Wikimedia developers “to spin up stacks of services within the labs environment” using Juju (more details).
  • NOLA Hackathon 28

    Brion Vibber leading developers into the "glorious Git future"

    Since the engineering department is planning a switch from Subversion to Git in the next few months, Brion taught nearly everyone there how Git works (slides, audio), and how we’ll be using Git in the future. This change in our source code repository and workflow will, we hope, enable more speed and flexibility in development, both for WMF developers and community contributors.
  • We prioritized and addressed several open requests for the operations team and defect reports about the latest version of MediaWiki, 1.18, which had just been deployed across WMF sites.
  • Roan found and fixed an issue that was spouting symbolic link errors into our Apache logs, so now it’ll be easier for us to see more dangerous errors in those logs.
  • Google Summer of Code students Salvatore Ingala and Kevin Brown made progress on integrating their summers’ work into MediaWiki as used and deployed by others; Salvatore and WMF developer Roan Kattouw have a plan for getting his user scripts improvements reviewed and deployed, so they can benefit Wikimedia readers and editors.
  • A volunteer came in on Friday night knowing nothing about developing for MediaWiki, and by the end of the weekend had a working development environment on her laptop and had some ideas about how to contribute.
  • We had substantive conversations about the summer internship program and about third-party collaboration that will affect how we work in the future.

NOLA Hackathon 1

Launch Pad New Orleans, a great venue

We also ate dinner together, walked Bourbon Street, and generally got to know colleagues we’d never met before.  I expect these relationships will bear fruit for years to come.

Thanks to Ryan Lane and Dana Isokawa for organizing the event with me, and thanks to Launch Pad New Orleans for providing the venue!

Our next developers’ event is a hackathon in Mumbai November 18-20 concentrating on internationalization, localization, and mobile work.  To find out about other upcoming Wikimedia technical events, check the meetings wiki page, and follow @MediaWikiMeet on Identi.ca or Twitter.

Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation

Native HTTPS support enabled for all Wikimedia Foundation wikis

We recently enabled protocol-relative URLs on all Wikimedia Foundation wikis. That change was in preparation for this change: we’ve just enabled native HTTPS support on all wikis.

What does this mean?

This means that you can now visit sites using the HTTPS protocol; for instance, if you wish to visit the English Wikipedia, you can go to: https://en.wikipedia.org. This allows you to visit our sites without having your browsing habits tracked, and you can log in without having your password or user session data stolen.

Didn’t we already have secure.wikimedia.org?

Yes. However, there were a number of cons associated with secure.wikimedia.org:

  • The URLs for secure were different than the URLs for the projects; Commons was at: https://secure.wikimedia.org/wikipedia/commons/wiki/Main_Page. Unless you followed a link there or were using the HTTPS Everywhere browser extension it would be difficult to know how to get there.
  • We have a bunch of dirty JavaScript hacks that were required to make secure work, thanks to the URL issue above.
  • We have a bunch of web server configuration hacks to make secure work properly.
  • Secure isn’t enabled for everything, thanks to the difficult configuration hacks needed to add new sites.
  • Secure isn’t scalable. Secure bypasses most of our caching layers, and is difficult to run on more than a single server.
  • Secure isn’t actually secure. It always has mixed-content due to the above issues and because upload.wikimedia.org didn’t have HTTPS support.

How is this better than secure.wikimedia.org?

  • The URLs are exactly the same, just the protocol is different.
  • The SSL servers only do SSL termination, and they do it before our caching layers. This means that this is a transparent addition to our architecture. When we scale other pieces of our architecture, this will scale with it.
  • We can remove our CSS, Javascript, and web server hacks. This simplifies everything, and makes it far less likely we’ll have mixed-content issues.

What will happen to secure.wikimedia.org?

Links pointing to secure.wikimedia.org will continue to work. The plan is to make URLs on secure.wikimedia.org redirect to the proper HTTPS URLs. secure.wikimedia.org will no longer act as a proxy for the sites.

But, I still get mixed-content warnings!

Unfortunately, the long-tail of this project is very long. Fixing all mixed-content warnings will be a long effort by both the MediaWiki core and extension developers and the project communities. A number of templates, CSS, and Javascript on projects are improperly referencing resources, and as such, they are being loaded incorrectly. All resources should be referenced using protocol-relative URLs now (//<resource-url> vs http://<resource-url>).

Some links are bringing me back to HTTP mode

All of the links in our content have changed from being protocol-specific to protocol-relative. This content is cached in our squid layer, and in our parser cache. We don’t wish to clear our entire cache immediately to fix this, as it would cause severe performance issues. Instead we will either clear the cache slowly over time, or we’ll let it clear naturally. This will take at most a month.

How will this affect HTTPS Everywhere?

A lot of us use HTTPS Everywhere too, so we definitely want this updated as well. We’ve been working closely with the EFF to ensure the ruleset will be changed soon after we enable native HTTPS. MediaWiki developer Roan Kattouw, who is also the developer responsible for the protocol-relative URL support, made a git branch and changed and tested the new HTTPS rulesets for HTTPS Everywhere. Hopefully you’ll see the changes in place soon.

How is this implemented?

We have fairly detailed public documentation on how this is implemented. We’ve also very recently released our puppet configuration in a public git repository, so you can see the exact configuration as well:

As a bonus, this is also the configuration for our IPv6 proxies (which are only currently enabled for upload.wikimedia.org).
Ryan Lane
Operations Engineer

Protocol-relative URLs enabled on all Wikimedia Foundation wikis

In July we enabled protocol-relative URLs on testwiki, and asked for bug reports. We did this in preparation for native HTTPS support for the sites. We received and fixed a number of protocol-relative related bugs, and then tested on a few of the larger wikis. We are now at a point where protocol-relative URL support is stable enough to enable it on all wikis, so today we’ve enabled it.

For information about what protocol-relative URLs are, why they are needed, and how it’ll affect you, see the post written in July. In brief: this changes most links we output in our content from looking like http://www.example.com to //www.example.com . The change shouldn’t affect you.

If you find any bugs related to protocol-relative URLs, please submit a bug report. Known issues are linked from the tracking bug.

Ryan Lane
Operations Engineer

EDIT Sep 28 14:29 UTC: Because of reported breakage in iOS clients, the API’s action=parse interface has been hacked not to return protocol-relative URLs. This is a temporary hack that you should not rely on; fix your clients instead. For details, see http://lists.wikimedia.org/pipermail/mediawiki-api-announce/2011-September/000024.html

Ever wondered how the Wikimedia servers are configured?

Well, wonder no longer! To configure the Wikimedia servers, we use Puppet, a configuration management system, which lets us write code that manages all of our servers like a single large application. Of course, to really know how our servers are configured, you’d need to see our Puppet configuration.

Good news: we’ve just released our Puppet configuration in a public Git repository.

What is and isn’t included

Basically everything is included in the repository. We spent a few weeks removing private and sensitive things from the repository, though. We have these in a private repository that is only available to Wikimedia staff and volunteers with root access.

This, of course, means that the puppet configuration, as released, won’t completely work. The public repository makes references to files and manifests in the private repository. To make the repository work, you’ll need to fill in the missing information. There isn’t very much in the private repository, though, so that task should be fairly easy.

The point of making this repository public

We have a couple reasons for making this repository public:

  1. It shares knowledge with the world
  2. It lets us treat operations like a software development project

Both reasons align with our mission, but we were already mostly sharing this knowledge via wikitech. The second reason aligns more closely with our mission, as it allows us to let the world be directly involved in our operations efforts.

Labs and community oriented operations

The release of this Puppet repository is the first step in the Wikimedia Test/Dev Labs project. We’ll be going further than just making the repository readable by the world. Part of the Test/Dev Labs project is to create a clone of our production cluster. This clone will run a branch of the puppet repository.

Staff and community developers, and staff and community operations engineers will be able to push changes to the test branch of the Puppet repository, which will manage the cloned cluster. They’ll then be able to push these changes for review to the production branch of the Puppet repository. The staff operations engineers can then code-review the changes and push the changes out to the production systems.

Like the Wikimedia content, the site interface, and the site’s software (MediaWiki), community members will be able to edit the site’s architecture as well.

Accessing the repository

Since this is a public Git repository, you can do an anonymous git clone like so:

git clone https://gerrit.wikimedia.org/r/p/operations/puppet

You can browse the repository through the gitweb interface. You can see the code review activity via Gerrit.

Ryan Lane
Operations Engineer

Does your Wikipedia mobile App expect our full content layout?

If so we have an upcoming change this week that you should be aware of. We’re in the final part of our new device detection testing that will automatically redirect any mobile agent we recognize over to its corresponding .m mobile gateway.This means that if your app declares a mobile UA as recognized by WURFL and connects directly to us we will redirect that traffic to .m.wikipedia.org and NOT .wikipedia.org.
Those apps that use an intermediate gateway which don’t have a mobile user agent will not be affected. If on the other hand your app does all of your logic then you will need to explicitly identify your UA to us.  Or, ensure that your UA contains “bot” to bypass redirection.

If this is not the behavior that you want then please let us know at know on meta or come find us on freenode #wikimedia-mobile.

Tomasz Finc

Director of Mobile and Special Projects