Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Ever wondered how the Wikimedia servers are configured?

Well, wonder no longer! To configure the Wikimedia servers, we use Puppet, a configuration management system, which lets us write code that manages all of our servers like a single large application. Of course, to really know how our servers are configured, you’d need to see our Puppet configuration.

Good news: we’ve just released our Puppet configuration in a public Git repository.

What is and isn’t included

Basically everything is included in the repository. We spent a few weeks removing private and sensitive things from the repository, though. We have these in a private repository that is only available to Wikimedia staff and volunteers with root access.

This, of course, means that the puppet configuration, as released, won’t completely work. The public repository makes references to files and manifests in the private repository. To make the repository work, you’ll need to fill in the missing information. There isn’t very much in the private repository, though, so that task should be fairly easy.

The point of making this repository public

We have a couple reasons for making this repository public:

  1. It shares knowledge with the world
  2. It lets us treat operations like a software development project

Both reasons align with our mission, but we were already mostly sharing this knowledge via wikitech. The second reason aligns more closely with our mission, as it allows us to let the world be directly involved in our operations efforts.

Labs and community oriented operations

The release of this Puppet repository is the first step in the Wikimedia Test/Dev Labs project. We’ll be going further than just making the repository readable by the world. Part of the Test/Dev Labs project is to create a clone of our production cluster. This clone will run a branch of the puppet repository.

Staff and community developers, and staff and community operations engineers will be able to push changes to the test branch of the Puppet repository, which will manage the cloned cluster. They’ll then be able to push these changes for review to the production branch of the Puppet repository. The staff operations engineers can then code-review the changes and push the changes out to the production systems.

Like the Wikimedia content, the site interface, and the site’s software (MediaWiki), community members will be able to edit the site’s architecture as well.

Accessing the repository

Since this is a public Git repository, you can do an anonymous git clone like so:

git clone https://gerrit.wikimedia.org/r/p/operations/puppet

You can browse the repository through the gitweb interface. You can see the code review activity via Gerrit.

Ryan Lane
Operations Engineer

27 Responses to “Ever wondered how the Wikimedia servers are configured?”

  1. Yuri says:

    Thank you, guys. It was buggy version of git-core package (1:1.6.3.3-2ubuntu0.1).
    git-core 1:1.7.0.4-1ubuntu0.2 works perfectly.

  2. Jon Sailor says:

    Hmm… I was able to clone when I switched to an almost-identical machine, but with newer software. In particular, the machine with git 1.5.6.5 couldn’t clone, but the machine with git 1.7.2.5 could. That might be it, Yuri.

  3. Jon Sailor says:

    Also fails for me:

    (0) hikari ~/workdir/tweak $ git clone https://gerrit.wikimedia.org/r/p/operations/puppet
    Initialized empty Git repository in /home/jon/workdir/tweak/puppet/.git/
    error: The requested URL returned error: 403
    warning: remote HEAD refers to nonexistent ref, unable to checkout.

    (0) hikari ~/workdir/tweak $

    Accessing https://gerrit.wikimedia.org/r/p/operations/puppet.git/info/refs directly in iceweasel fails, too.

  4. Grigor says:

    Thumbs up, Roan! I was just wondering the same.

    (A link to a daily archive might also be a good idea. Will be simpler for git non-speakers.)

  5. netxfly says:

    nice!
    拿走学习了,thx

  6. Yuri says:

    It seems access is limited now, right?
    $ git clone https://gerrit.wikimedia.org/r/p/operations/puppet.git
    Initialized empty Git repository in /tmp/puppet/.git/
    fatal: https://gerrit.wikimedia.org/r/p/operations/puppet.git/info/refs download error – The requested URL returned error: 403

  7. Pete says:

    How do you deal with puppet change management and testing configurations? for example the next website release now requires a change to an apache conf file, a pear module, and crontab.

    • Ryan Lane says:

      We’ll be figuring that out soon enough ;). This is a work in progress. If you are willing to help and learn with us, please contact me.

  8. Robin Walsh says:

    Depending on what version of puppet you’re using, you could also use extlookup; store the values in either host- or domain- or site-specific csv files, use extlookup(“super-secret-password”) and then you can simply store the extlookup data in a separate repo. That way you could still publish a sensible “common.csv” with such amusing fields as:

    super-secret-password,g0tcha!suck4z!

  9. micah says:

    Hi,

    Great to see you using Puppet, and publishing your recipies! I work on the puppet packages for Debian, I hope you find them useful.

    I noticed that you have not embraced the best practices model of using modules and instead have everything in manifests. Modules is a really interesting way of abstracting out your puppet modules that opens the door for great collaboration with other groups who are also working on module development. You can share development on abstract pieces of your infrastructure, without having to give any access. Its a very interesting way to be involved in collaborative development of your infrastructure, in ways that were never possible before.

    I work with a number of organizations who collaborate on module development, you should check us out: https://labs.riseup.net/code/projects/show/sharedpuppetmodules – join the mailing list or irc and ask questions, it would be great to work with you!

    • Ryan Lane says:

      @micah: Well, until now we weren’t planning on sharing our puppet config. If you’d like, you can help us with things like this.

  10. Ryan et al

    Great to see these released. Let us know if you’d like any help, advice or code review from Puppet Labs. Happy to help set something up for you.

    • Ryan Lane says:

      @James We’d love to have some help. We aren’t ready just yet to give out accounts, but will be soon. I’ll let you guys know when it’s ready.

  11. Michael says:

    We did the same 1 year ago for the mageia project ( for the same type of reason, plus a couple others like transparency and trust, and to help us recruit sysadmin ), and we faced the same issue for password.

    And so we used extlookup with a default value of x, so anybody could test ( in a insecure way ). I heard that hiera was also something nice for that, you could publish a default file, and override it without fiddling with git.

  12. Roan Kattouw says:

    Random idea: why don’t we publish boilerplate versions of the private files with placeholders for passwords etc such that you can replace those placeholders with your own private data and have things Just Work?