Wikimedia blog

News from inside the Wikimedia Foundation.org

Operations

Scaling media storage at Wikimedia with Swift

Wikipedia is huge. Almost four million articles in English alone — but as they say, a picture is worth a thousand words (actually, it’s usually closer to several million). In terms of raw bits on disk, the largest project is clearly the Wikimedia Commons, the free media repository integrated with all of the Wikimedia projects. In addition, many projects allow their own local media uploads. As a result, across all wikis, Wikimedia stores millions of images, sounds, and other media files.

We’ve been able to manage the load for quite a while by using two servers with lots of local storage — (10 and 30TB), but we’re pushing against that limit and we would like a more fault-tolerant option. So, for the last few months, we have been working on replacing the infrastructure that holds all that data.

Our goal is to have a storage system that will allow us to scale more easily, and accept large collections of media from projects like Wiki Loves Monuments, and the U.S. National Archives’ donation of their collection of photographs by Ansel Adams.

After evaluating a number of options, we chose to pursue OpenStack Swift. Swift is a distributed object storage system with automatic replication, so that if one host has problems the requseted file is retrieved from another server with no interruption of service. Aside from meeting our needs around performance, reliability, and scalability, it is a good fit considering we are also using OpenStack products for Wikimedia Labs.

We have just completed the first milestone along the road to replacing our existing storage systems with Swift: all image thumbnails (scaled images such as a 320px version of a picture) are now stored on Swift. Our current production Swift cluster is made up of 4 back-end storage nodes with 22TB each and 2 front-end proxy nodes that handle user web requests. This new architecture provides us the scalability and reliability we need going forward.

Over the next few months we will build a second Swift cluster in our Virginia data center, then work on migrating all of the original media over to Swift as well. For more detail on the implementation and plan for Swift, you can read up on the documentation on Wikitech, ask questions in the comments below, or come and visit us in #wikimedia-tech on Freenode in IRC.

Ben Hartshorne
Operations
Wikimedia Foundation

Techies learn, make, win at Foundation’s first San Francisco hackathon

Participants at the San Francisco hackathon in 2012

Participants at the San Francisco hackathon in January 2012

In January, 92 participants gathered in San Francisco to learn about Wikimedia technology and to build things in our first Bay Area hackathon.

After a kickoff speech by Foundation VP of Engineering Erik Möller (video), we led tutorials on the MediaWiki web API, customizing wikis with JavaScript user scripts and Gadgets, and building the Wikipedia Android app.  (We recorded each training; click those links for how-to guides and videos.)  We asked the participants to self-organize into teams and work on projects.  After their demonstration showcase, judges awarded a few prizes to the best demos.

(more…)

Nobody notices when it’s not broken: New database servers deployed

The Technical Operations team has just completed behind-the-scenes work that will likely never be noticed by our readers.

Our External Storage databases hold the text for every version of every wiki page; they have slowly grown over the life of Wikipedia and its sister projects. Ten years is a lifetime on the Internet, and the incremental changes that were made to our external storage system over that period, though appropriate at the time they were made, resulted in a setup that was a challenge to maintain and which was becoming unreliable.

Graph of query durationWe spent a few weeks analyzing all the various servers across which the page text data was spread, in order to gather it all together onto a single host. From there, it could be replicated onto newer, more reliable and higher performance hardware. Along the way, we found and fixed a number of inconsistencies to make the dataset more regular.

The deployment of the new hardware lasted a few days (as we moved things piece by piece) and was finished this past Monday with no fanfare. There was a brief (about 10 minute) period during which articles were unable to be edited while we switched writes to the new hardware. The end result is a barrage of small improvements, all of which together make for a happy TechOps team:

  • average query duration has dropped from about 15ms to around 8ms and the worst case from 576ms down to 60ms;
  • replication and failover processes are now well known and standardized;
  • total hardware used has dropped from around 30 servers to 8, now in two locations;
  • hosts no longer double up as web servers and database servers for text; dedicated servers are used for the database.

It’s a small victory in the battle against entropy, but an important prerequisite for carrying out our mission of providing unfettered and reliable access to the sum of all knowledge.

Ben Hartshorne, Operations Engineer

Tech meetup moves Wikimedia infrastructure forward

Earlier this month, about thirty MediaWiki developers and interested technologists gathered in New Orleans to learn and to work on Wikimedia’s technical infrastructure.  We made broad progress on the infrastructure of innovation at Wikimedia (notes).  Specifically:

NOLA Hackathon 16

Tim Starling and DJ Bauch driving towards greater media file storage system independence and robustness

  • We are now much closer to officially opening the doors to Wikimedia Labs and giving far more people the ability to contribute to MediaWiki without having to set up and maintain their own development environments at home.  Wikimedia Labs will provide hosted, virtualized test and development sandboxes for new and experienced programmers and systems administrators.  Many developers got beta Labs accounts, we tested at a larger scale, and we fixed several bugs.
  • Developers agreed to create a file backend abstraction layer to enable large-scale MediaWiki installations to use one of several storage systems to contain big collections of big media files.  (Wikimedia plans on using Swift, which is open source.) Microsoft’s Ben Lobaugh and SAIC’s DJ Bauch collaborated towards improving MediaWiki’s performance on Microsoft technologies as well.  Developers made architectural decisions, refactored some existing code, and improved documentation and tests for the SwiftMedia extension to MediaWiki.
  • Chad Horohoe teaching developers about unit testing

    Chad Horohoe teaching developers unit testing

    We now have a continuous integration server up and running.  This will continuously run tests checking on the latest new features and bugfixes that developers write, resulting in fewer bugs and faster development. Developers will need to write tests to reap the benefits, so Chad Horohoe taught a test-writing workshop.

  • Max Semenik finished and demonstrated the first version of his API Query Sandbox.  This allows software developers anywhere to experiment with ways to automatically get data from Wikipedia or other sites that run MediaWiki, thus enabling wider and deeper reuse of Wikimedia content.
  • Operations folks continued the Puppetization of our infrastructure: they completely reworked Varnish management in Puppet, and worked on Puppet configurations for SwiftMedia testing. This configuration management work will ensure that ops can move faster and more confidently in building and maintaining Wikimedia infrastructure. And Canonical’s Mark Mims and Kapil Thangavelu worked on improving methods for Wikimedia developers “to spin up stacks of services within the labs environment” using Juju (more details).
  • NOLA Hackathon 28

    Brion Vibber leading developers into the "glorious Git future"

    Since the engineering department is planning a switch from Subversion to Git in the next few months, Brion taught nearly everyone there how Git works (slides, audio), and how we’ll be using Git in the future. This change in our source code repository and workflow will, we hope, enable more speed and flexibility in development, both for WMF developers and community contributors.
  • We prioritized and addressed several open requests for the operations team and defect reports about the latest version of MediaWiki, 1.18, which had just been deployed across WMF sites.
  • Roan found and fixed an issue that was spouting symbolic link errors into our Apache logs, so now it’ll be easier for us to see more dangerous errors in those logs.
  • Google Summer of Code students Salvatore Ingala and Kevin Brown made progress on integrating their summers’ work into MediaWiki as used and deployed by others; Salvatore and WMF developer Roan Kattouw have a plan for getting his user scripts improvements reviewed and deployed, so they can benefit Wikimedia readers and editors.
  • A volunteer came in on Friday night knowing nothing about developing for MediaWiki, and by the end of the weekend had a working development environment on her laptop and had some ideas about how to contribute.
  • We had substantive conversations about the summer internship program and about third-party collaboration that will affect how we work in the future.

NOLA Hackathon 1

Launch Pad New Orleans, a great venue

We also ate dinner together, walked Bourbon Street, and generally got to know colleagues we’d never met before.  I expect these relationships will bear fruit for years to come.

Thanks to Ryan Lane and Dana Isokawa for organizing the event with me, and thanks to Launch Pad New Orleans for providing the venue!

Our next developers’ event is a hackathon in Mumbai November 18-20 concentrating on internationalization, localization, and mobile work.  To find out about other upcoming Wikimedia technical events, check the meetings wiki page, and follow @MediaWikiMeet on Identi.ca or Twitter.

Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation

Native HTTPS support enabled for all Wikimedia Foundation wikis

We recently enabled protocol-relative URLs on all Wikimedia Foundation wikis. That change was in preparation for this change: we’ve just enabled native HTTPS support on all wikis.

What does this mean?

This means that you can now visit sites using the HTTPS protocol; for instance, if you wish to visit the English Wikipedia, you can go to: https://en.wikipedia.org. This allows you to visit our sites without having your browsing habits tracked, and you can log in without having your password or user session data stolen.

Didn’t we already have secure.wikimedia.org?

Yes. However, there were a number of cons associated with secure.wikimedia.org:

  • The URLs for secure were different than the URLs for the projects; Commons was at: https://secure.wikimedia.org/wikipedia/commons/wiki/Main_Page. Unless you followed a link there or were using the HTTPS Everywhere browser extension it would be difficult to know how to get there.
  • We have a bunch of dirty JavaScript hacks that were required to make secure work, thanks to the URL issue above.
  • We have a bunch of web server configuration hacks to make secure work properly.
  • Secure isn’t enabled for everything, thanks to the difficult configuration hacks needed to add new sites.
  • Secure isn’t scalable. Secure bypasses most of our caching layers, and is difficult to run on more than a single server.
  • Secure isn’t actually secure. It always has mixed-content due to the above issues and because upload.wikimedia.org didn’t have HTTPS support.

How is this better than secure.wikimedia.org?

  • The URLs are exactly the same, just the protocol is different.
  • The SSL servers only do SSL termination, and they do it before our caching layers. This means that this is a transparent addition to our architecture. When we scale other pieces of our architecture, this will scale with it.
  • We can remove our CSS, Javascript, and web server hacks. This simplifies everything, and makes it far less likely we’ll have mixed-content issues.

What will happen to secure.wikimedia.org?

Links pointing to secure.wikimedia.org will continue to work. The plan is to make URLs on secure.wikimedia.org redirect to the proper HTTPS URLs. secure.wikimedia.org will no longer act as a proxy for the sites.

But, I still get mixed-content warnings!

Unfortunately, the long-tail of this project is very long. Fixing all mixed-content warnings will be a long effort by both the MediaWiki core and extension developers and the project communities. A number of templates, CSS, and Javascript on projects are improperly referencing resources, and as such, they are being loaded incorrectly. All resources should be referenced using protocol-relative URLs now (//<resource-url> vs http://<resource-url>).

Some links are bringing me back to HTTP mode

All of the links in our content have changed from being protocol-specific to protocol-relative. This content is cached in our squid layer, and in our parser cache. We don’t wish to clear our entire cache immediately to fix this, as it would cause severe performance issues. Instead we will either clear the cache slowly over time, or we’ll let it clear naturally. This will take at most a month.

How will this affect HTTPS Everywhere?

A lot of us use HTTPS Everywhere too, so we definitely want this updated as well. We’ve been working closely with the EFF to ensure the ruleset will be changed soon after we enable native HTTPS. MediaWiki developer Roan Kattouw, who is also the developer responsible for the protocol-relative URL support, made a git branch and changed and tested the new HTTPS rulesets for HTTPS Everywhere. Hopefully you’ll see the changes in place soon.

How is this implemented?

We have fairly detailed public documentation on how this is implemented. We’ve also very recently released our puppet configuration in a public git repository, so you can see the exact configuration as well:

As a bonus, this is also the configuration for our IPv6 proxies (which are only currently enabled for upload.wikimedia.org).
Ryan Lane
Operations Engineer

Protocol-relative URLs enabled on all Wikimedia Foundation wikis

In July we enabled protocol-relative URLs on testwiki, and asked for bug reports. We did this in preparation for native HTTPS support for the sites. We received and fixed a number of protocol-relative related bugs, and then tested on a few of the larger wikis. We are now at a point where protocol-relative URL support is stable enough to enable it on all wikis, so today we’ve enabled it.

For information about what protocol-relative URLs are, why they are needed, and how it’ll affect you, see the post written in July. In brief: this changes most links we output in our content from looking like http://www.example.com to //www.example.com . The change shouldn’t affect you.

If you find any bugs related to protocol-relative URLs, please submit a bug report. Known issues are linked from the tracking bug.

Ryan Lane
Operations Engineer

EDIT Sep 28 14:29 UTC: Because of reported breakage in iOS clients, the API’s action=parse interface has been hacked not to return protocol-relative URLs. This is a temporary hack that you should not rely on; fix your clients instead. For details, see http://lists.wikimedia.org/pipermail/mediawiki-api-announce/2011-September/000024.html

Ever wondered how the Wikimedia servers are configured?

Well, wonder no longer! To configure the Wikimedia servers, we use Puppet, a configuration management system, which lets us write code that manages all of our servers like a single large application. Of course, to really know how our servers are configured, you’d need to see our Puppet configuration.

Good news: we’ve just released our Puppet configuration in a public Git repository.

What is and isn’t included

Basically everything is included in the repository. We spent a few weeks removing private and sensitive things from the repository, though. We have these in a private repository that is only available to Wikimedia staff and volunteers with root access.

This, of course, means that the puppet configuration, as released, won’t completely work. The public repository makes references to files and manifests in the private repository. To make the repository work, you’ll need to fill in the missing information. There isn’t very much in the private repository, though, so that task should be fairly easy.

The point of making this repository public

We have a couple reasons for making this repository public:

  1. It shares knowledge with the world
  2. It lets us treat operations like a software development project

Both reasons align with our mission, but we were already mostly sharing this knowledge via wikitech. The second reason aligns more closely with our mission, as it allows us to let the world be directly involved in our operations efforts.

Labs and community oriented operations

The release of this Puppet repository is the first step in the Wikimedia Test/Dev Labs project. We’ll be going further than just making the repository readable by the world. Part of the Test/Dev Labs project is to create a clone of our production cluster. This clone will run a branch of the puppet repository.

Staff and community developers, and staff and community operations engineers will be able to push changes to the test branch of the Puppet repository, which will manage the cloned cluster. They’ll then be able to push these changes for review to the production branch of the Puppet repository. The staff operations engineers can then code-review the changes and push the changes out to the production systems.

Like the Wikimedia content, the site interface, and the site’s software (MediaWiki), community members will be able to edit the site’s architecture as well.

Accessing the repository

Since this is a public Git repository, you can do an anonymous git clone like so:

git clone https://gerrit.wikimedia.org/r/p/operations/puppet

You can browse the repository through the gitweb interface. You can see the code review activity via Gerrit.

Ryan Lane
Operations Engineer

Does your Wikipedia mobile App expect our full content layout?

If so we have an upcoming change this week that you should be aware of. We’re in the final part of our new device detection testing that will automatically redirect any mobile agent we recognize over to its corresponding .m mobile gateway.This means that if your app declares a mobile UA as recognized by WURFL and connects directly to us we will redirect that traffic to .m.wikipedia.org and NOT .wikipedia.org.
Those apps that use an intermediate gateway which don’t have a mobile user agent will not be affected. If on the other hand your app does all of your logic then you will need to explicitly identify your UA to us.  Or, ensure that your UA contains “bot” to bypass redirection.

If this is not the behavior that you want then please let us know at know on meta or come find us on freenode #wikimedia-mobile.

Tomasz Finc

Director of Mobile and Special Projects

Protocol relative URLs enabled on test.wikipedia.org

In preparation for enabling HTTPS on Wikimedia Foundation sites, we’ve recently enabled protocol relative URLs on test.wikipedia.org. Protocol relative URLs are needed to make the site work properly in both HTTP and HTTPS modes.

What are protocol relative URLs?

Normal URLs look like: http://test.wikipedia.org/wiki/Main_Page or https://test.wikipedia.org/wiki/Main_Page. Both of these URLs define the protocol that will be used. Protocol relative URLs look like this: //test.wikipedia.org/wiki/Main_Page. Dropping the protocol from the URL allows the browser to assign the current protocol to the URL. So, if you are visiting the site in HTTPS mode, links will point to HTTPS, and if you are visiting the site in HTTP mode, links will point to HTTP.

Why are protocol relative URLs needed?

We need to use protocol relative URLs for a couple reasons:

  1. All requests are served by our caching layer (squid or varnish). If you are browsing the site in HTTPS mode, and another user is browsing the same pages in HTTP mode, two versions of those pages will be stored in our cache, as the links are different between the two modes. This splits our cache, which makes it less efficient and more expensive to operate.
  2. When browsing in HTTPS mode, we want to ensure links point to the correct protocol. When pages are parsed, things like interwiki links are created by the parser. If we do not use protocol relative URLs, then links will point to either HTTPS or HTTP, which will cause users to switch modes randomly.

How does this affect me?

It shouldn’t. Things should continue to work as before. We are currently testing this out on some internal wikis, and have enabled it on test.wikipedia.org so that the entire community will have a couple weeks to test it out before we enable it on all projects.

API users, especially, should test thoroughly. The API, in most cases, will not output protocol-relative URLs, but will continue to output http:// URLs no matter whether you call it over HTTP or HTTPS. This is because we don’t expect API clients to be able to resolve protocol relative URLs correctly, and that the context of these URLs (which is needed to resolve them) will frequently get lost along the way.

The exceptions to this are:

  • HTML produced by the parser will have protocol-relative URLs in <a href=”…”> tags etc.
  • prop=extlinks and list=exturlusage will output URLs verbatim as they appear in the article, which means they may output protocol-relative URLs

If you are getting protocol-relative URLs in some other place in the API, that’s likely a bug.

If you notice any issues related to protocol relative URLs, in the API or not, please let us know.

Note: we’ve also enabled HTTPS on test.wikipedia.org; so, please do test protocol relative URLs in HTTP and HTTPS modes. There is at least one known bug with regards to HTTPS mode and redirects, which will be fixed soon. More to come on this in a later post.

Ryan Lane

Server Decommission Donations

At this time we have closed submissions.  We have received well over 100 requests, and will not have enough servers to cover those, let alone more.  Thanks for the submissions!  ~ RobH @ 2011-06-18 @ 10:00 EST

Due to the overwhelming response, we will unfortunately not be able to reply to everyone on an individual basis.  If your organization is selected, you will receive an email from us indicating the approval, as well as shipping information. ~ RobH @ 2011-06-30 @ 13.30 EST

 

Wikimedia Foundation  has  been upgrading and adding new servers to keep up with traffic demand and capacity growth as we always do. Recently, we replaced some of our older servers with faster, higher capacity  and more energy efficient servers.  These  older servers are now decommissioned and will be donated away. Do note  that they are over 3+ years old and are out of warranty.  While we may have placed a lot of demand on them  over the years, they are in fine working condition.

Most systems (but possibly not all) have the following specifications:

  • Dual CPU 2.5 GHz
  • From 3GB to 24GB of RAM, depending on role.
  • Most have 80 GB or larger HDD (some have two hard drives, some drives are 160GB or possibly even 250GB)

If you are interested, please provide the following information in your email to us:

  • Registered non-profit name and information.
  • Your contact information, including email address, phone number, and relationship with requesting non-profit.
  • Information on the non-profit, their charter,  mission  and goals.
  • Shipping address information for a FedEx Ground delivery (i.e., the shipment destination)*
  • How the servers will be used.  (We like to know and share with folks!)

* At this time we regret that we are only able to ship servers to USA based non-profits.  This is due to the cost of shipping and the various exportation laws and taxes that result from shipping internationally.

Please provide as much detail as possible on how you plan to use the servers. For example,   ‘Wikimedia will use these for our sites.’ is pretty vague where as  ‘Wikimedia is the non-profit foundation that runs Wikipedia.  Server donations to us would be used to run our websites that allow access to Wikipedia and its sister projects.’ is much clearer.

If you are not a registered non-profit, your use of the server(s) must be utilized in a fashion that works with or on the projects of the Wikimedia Foundation.  We are not donating these servers to private individuals for personal use.  All requests that are not for use on Wikimedia projects or are not going to a non-profit will be ignored.

By submitting and possibly accepting servers from us, you are granting the Wikimedia Foundation permission to publish details of the donation.  This is normally (but not limited to) a quick blurb about it on our Tech Blog (http://techblog.wikimedia.org).

The Wikimedia Foundation provides no guarantee of the hardware donated in any manner.  Any use of the hardware is not the responsibility of the Wikimedia Foundation.

All requests will be reviewed by our technical team, and they will reply back regarding server availability.  Please keep in mind that these are handled on a low priority schedule, with our normal operations taking precedence.  There may be delays in shipping out your request, or we simply run out of servers.

At this time we have closed submissions.  We have received well over 100 requests, and will not have enough servers to cover those, let alone more.  Thanks for the submissions!  ~ RobH @ 2011-06-18 @ 10:00 EST

Rob Halsell

Wikimedia Operations Engineer