Wikipedia is huge. Almost four million articles in English alone — but as they say, a picture is worth a thousand words (actually, it’s usually closer to several million). In terms of raw bits on disk, the largest project is clearly the Wikimedia Commons, the free media repository integrated with all of the Wikimedia projects. In addition, many projects allow their own local media uploads. As a result, across all wikis, Wikimedia stores millions of images, sounds, and other media files.
We’ve been able to manage the load for quite a while by using two servers with lots of local storage — (10 and 30TB), but we’re pushing against that limit and we would like a more fault-tolerant option. So, for the last few months, we have been working on replacing the infrastructure that holds all that data.
Our goal is to have a storage system that will allow us to scale more easily, and accept large collections of media from projects like Wiki Loves Monuments, and the U.S. National Archives’ donation of their collection of photographs by Ansel Adams.
After evaluating a number of options, we chose to pursue OpenStack Swift. Swift is a distributed object storage system with automatic replication, so that if one host has problems the requseted file is retrieved from another server with no interruption of service. Aside from meeting our needs around performance, reliability, and scalability, it is a good fit considering we are also using OpenStack products for Wikimedia Labs.
We have just completed the first milestone along the road to replacing our existing storage systems with Swift: all image thumbnails (scaled images such as a 320px version of a picture) are now stored on Swift. Our current production Swift cluster is made up of 4 back-end storage nodes with 22TB each and 2 front-end proxy nodes that handle user web requests. This new architecture provides us the scalability and reliability we need going forward.
Over the next few months we will build a second Swift cluster in our Virginia data center, then work on migrating all of the original media over to Swift as well. For more detail on the implementation and plan for Swift, you can read up on the documentation on Wikitech, ask questions in the comments below, or come and visit us in #wikimedia-tech on Freenode in IRC.