Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Posts by Ryan Lane

Opening our operations with Wikimedia Labs

For the past year and a half we’ve been working on a project named Wikimedia Labs, which enables us to invite our community to contribute to how our sites are run. Labs is a cloud computing environment using OpenStack for development, testing and deployment of Wikimedia’s infrastructure as a whole, enabling us to treat our infrastructure as an open source software project.

The problems we’re solving

When Wikipedia and its sister projects started, volunteers had root level access on our infrastructure. They were the only roots and most of the infrastructure they built is still in use today. Our lenient access policy made us flexible, so changes could happen quickly. Also, the sites were smaller, had far fewer users, and large, fundamental changes could be made in production.

Growth has made us less willing to give out root access to volunteers. Because of the size of our sites, downtime is less acceptable. But having fewer volunteers means we have less ideas, and due to that, our ability to make changes quickly is decreased. We haven’t had a new volunteer root in years. We haven’t even had a new volunteer with shell access. Engaging volunteers and enabling them to easily contribute is a wider problem as well.

Our software development community scales with volunteers. Unfortunately, operations doesn’t scale in a similar way right now. We’re limited to the staff operations engineers we currently have. The staff is great, but the fact that operations can’t scale to meet the needs of a large growth of developers means that operations is a bottleneck. Furthermore, our access policy prevents volunteer developers from learning how our infrastructure works.

This leads to a situation where our staff developers and volunteer developers can’t easily collaborate. Our volunteers also have no way of appropriately testing their changes, since our infrastructure is complex and difficult to replicate. This means it’s harder to take contributions, which further slows the pace of changes on our sites.

(more…)

Native HTTPS support enabled for all Wikimedia Foundation wikis

We recently enabled protocol-relative URLs on all Wikimedia Foundation wikis. That change was in preparation for this change: we’ve just enabled native HTTPS support on all wikis.

What does this mean?

This means that you can now visit sites using the HTTPS protocol; for instance, if you wish to visit the English Wikipedia, you can go to: https://en.wikipedia.org. This allows you to visit our sites without having your browsing habits tracked, and you can log in without having your password or user session data stolen.

Didn’t we already have secure.wikimedia.org?

Yes. However, there were a number of cons associated with secure.wikimedia.org:

  • The URLs for secure were different than the URLs for the projects; Commons was at: https://secure.wikimedia.org/wikipedia/commons/wiki/Main_Page. Unless you followed a link there or were using the HTTPS Everywhere browser extension it would be difficult to know how to get there.
  • We have a bunch of dirty JavaScript hacks that were required to make secure work, thanks to the URL issue above.
  • We have a bunch of web server configuration hacks to make secure work properly.
  • Secure isn’t enabled for everything, thanks to the difficult configuration hacks needed to add new sites.
  • Secure isn’t scalable. Secure bypasses most of our caching layers, and is difficult to run on more than a single server.
  • Secure isn’t actually secure. It always has mixed-content due to the above issues and because upload.wikimedia.org didn’t have HTTPS support.

How is this better than secure.wikimedia.org?

  • The URLs are exactly the same, just the protocol is different.
  • The SSL servers only do SSL termination, and they do it before our caching layers. This means that this is a transparent addition to our architecture. When we scale other pieces of our architecture, this will scale with it.
  • We can remove our CSS, Javascript, and web server hacks. This simplifies everything, and makes it far less likely we’ll have mixed-content issues.

What will happen to secure.wikimedia.org?

Links pointing to secure.wikimedia.org will continue to work. The plan is to make URLs on secure.wikimedia.org redirect to the proper HTTPS URLs. secure.wikimedia.org will no longer act as a proxy for the sites.

But, I still get mixed-content warnings!

Unfortunately, the long-tail of this project is very long. Fixing all mixed-content warnings will be a long effort by both the MediaWiki core and extension developers and the project communities. A number of templates, CSS, and Javascript on projects are improperly referencing resources, and as such, they are being loaded incorrectly. All resources should be referenced using protocol-relative URLs now (//<resource-url> vs http://<resource-url>).

Some links are bringing me back to HTTP mode

All of the links in our content have changed from being protocol-specific to protocol-relative. This content is cached in our squid layer, and in our parser cache. We don’t wish to clear our entire cache immediately to fix this, as it would cause severe performance issues. Instead we will either clear the cache slowly over time, or we’ll let it clear naturally. This will take at most a month.

How will this affect HTTPS Everywhere?

A lot of us use HTTPS Everywhere too, so we definitely want this updated as well. We’ve been working closely with the EFF to ensure the ruleset will be changed soon after we enable native HTTPS. MediaWiki developer Roan Kattouw, who is also the developer responsible for the protocol-relative URL support, made a git branch and changed and tested the new HTTPS rulesets for HTTPS Everywhere. Hopefully you’ll see the changes in place soon.

How is this implemented?

We have fairly detailed public documentation on how this is implemented. We’ve also very recently released our puppet configuration in a public git repository, so you can see the exact configuration as well:

As a bonus, this is also the configuration for our IPv6 proxies (which are only currently enabled for upload.wikimedia.org).
Ryan Lane
Operations Engineer

Protocol-relative URLs enabled on all Wikimedia Foundation wikis

In July we enabled protocol-relative URLs on testwiki, and asked for bug reports. We did this in preparation for native HTTPS support for the sites. We received and fixed a number of protocol-relative related bugs, and then tested on a few of the larger wikis. We are now at a point where protocol-relative URL support is stable enough to enable it on all wikis, so today we’ve enabled it.

For information about what protocol-relative URLs are, why they are needed, and how it’ll affect you, see the post written in July. In brief: this changes most links we output in our content from looking like http://www.example.com to //www.example.com . The change shouldn’t affect you.

If you find any bugs related to protocol-relative URLs, please submit a bug report. Known issues are linked from the tracking bug.

Ryan Lane
Operations Engineer

EDIT Sep 28 14:29 UTC: Because of reported breakage in iOS clients, the API’s action=parse interface has been hacked not to return protocol-relative URLs. This is a temporary hack that you should not rely on; fix your clients instead. For details, see http://lists.wikimedia.org/pipermail/mediawiki-api-announce/2011-September/000024.html

Ever wondered how the Wikimedia servers are configured?

Well, wonder no longer! To configure the Wikimedia servers, we use Puppet, a configuration management system, which lets us write code that manages all of our servers like a single large application. Of course, to really know how our servers are configured, you’d need to see our Puppet configuration.

Good news: we’ve just released our Puppet configuration in a public Git repository.

What is and isn’t included

Basically everything is included in the repository. We spent a few weeks removing private and sensitive things from the repository, though. We have these in a private repository that is only available to Wikimedia staff and volunteers with root access.

This, of course, means that the puppet configuration, as released, won’t completely work. The public repository makes references to files and manifests in the private repository. To make the repository work, you’ll need to fill in the missing information. There isn’t very much in the private repository, though, so that task should be fairly easy.

The point of making this repository public

We have a couple reasons for making this repository public:

  1. It shares knowledge with the world
  2. It lets us treat operations like a software development project

Both reasons align with our mission, but we were already mostly sharing this knowledge via wikitech. The second reason aligns more closely with our mission, as it allows us to let the world be directly involved in our operations efforts.

Labs and community oriented operations

The release of this Puppet repository is the first step in the Wikimedia Test/Dev Labs project. We’ll be going further than just making the repository readable by the world. Part of the Test/Dev Labs project is to create a clone of our production cluster. This clone will run a branch of the puppet repository.

Staff and community developers, and staff and community operations engineers will be able to push changes to the test branch of the Puppet repository, which will manage the cloned cluster. They’ll then be able to push these changes for review to the production branch of the Puppet repository. The staff operations engineers can then code-review the changes and push the changes out to the production systems.

Like the Wikimedia content, the site interface, and the site’s software (MediaWiki), community members will be able to edit the site’s architecture as well.

Accessing the repository

Since this is a public Git repository, you can do an anonymous git clone like so:

git clone https://gerrit.wikimedia.org/r/p/operations/puppet

You can browse the repository through the gitweb interface. You can see the code review activity via Gerrit.

Ryan Lane
Operations Engineer

Protocol relative URLs enabled on test.wikipedia.org

In preparation for enabling HTTPS on Wikimedia Foundation sites, we’ve recently enabled protocol relative URLs on test.wikipedia.org. Protocol relative URLs are needed to make the site work properly in both HTTP and HTTPS modes.

What are protocol relative URLs?

Normal URLs look like: http://test.wikipedia.org/wiki/Main_Page or https://test.wikipedia.org/wiki/Main_Page. Both of these URLs define the protocol that will be used. Protocol relative URLs look like this: //test.wikipedia.org/wiki/Main_Page. Dropping the protocol from the URL allows the browser to assign the current protocol to the URL. So, if you are visiting the site in HTTPS mode, links will point to HTTPS, and if you are visiting the site in HTTP mode, links will point to HTTP.

Why are protocol relative URLs needed?

We need to use protocol relative URLs for a couple reasons:

  1. All requests are served by our caching layer (squid or varnish). If you are browsing the site in HTTPS mode, and another user is browsing the same pages in HTTP mode, two versions of those pages will be stored in our cache, as the links are different between the two modes. This splits our cache, which makes it less efficient and more expensive to operate.
  2. When browsing in HTTPS mode, we want to ensure links point to the correct protocol. When pages are parsed, things like interwiki links are created by the parser. If we do not use protocol relative URLs, then links will point to either HTTPS or HTTP, which will cause users to switch modes randomly.

How does this affect me?

It shouldn’t. Things should continue to work as before. We are currently testing this out on some internal wikis, and have enabled it on test.wikipedia.org so that the entire community will have a couple weeks to test it out before we enable it on all projects.

API users, especially, should test thoroughly. The API, in most cases, will not output protocol-relative URLs, but will continue to output http:// URLs no matter whether you call it over HTTP or HTTPS. This is because we don’t expect API clients to be able to resolve protocol relative URLs correctly, and that the context of these URLs (which is needed to resolve them) will frequently get lost along the way.

The exceptions to this are:

  • HTML produced by the parser will have protocol-relative URLs in <a href=”…”> tags etc.
  • prop=extlinks and list=exturlusage will output URLs verbatim as they appear in the article, which means they may output protocol-relative URLs

If you are getting protocol-relative URLs in some other place in the API, that’s likely a bug.

If you notice any issues related to protocol relative URLs, in the API or not, please let us know.

Note: we’ve also enabled HTTPS on test.wikipedia.org; so, please do test protocol relative URLs in HTTP and HTTPS modes. There is at least one known bug with regards to HTTPS mode and redirects, which will be fixed soon. More to come on this in a later post.

Ryan Lane