Scaling media storage at Wikimedia with Swift

Translate This Post

Wikipedia is huge. Almost four million articles in English alone — but as they say, a picture is worth a thousand words (actually, it’s usually closer to several million). In terms of raw bits on disk, the largest project is clearly the Wikimedia Commons, the free media repository integrated with all of the Wikimedia projects. In addition, many projects allow their own local media uploads. As a result, across all wikis, Wikimedia stores millions of images, sounds, and other media files.
We’ve been able to manage the load for quite a while by using two servers with lots of local storage — (10 and 30TB), but we’re pushing against that limit and we would like a more fault-tolerant option. So, for the last few months, we have been working on replacing the infrastructure that holds all that data.
Our goal is to have a storage system that will allow us to scale more easily, and accept large collections of media from projects like Wiki Loves Monuments, and the U.S. National Archives’ donation of their collection of photographs by Ansel Adams.
After evaluating a number of options, we chose to pursue OpenStack Swift. Swift is a distributed object storage system with automatic replication, so that if one host has problems the requseted file is retrieved from another server with no interruption of service. Aside from meeting our needs around performance, reliability, and scalability, it is a good fit considering we are also using OpenStack products for Wikimedia Labs.
We have just completed the first milestone along the road to replacing our existing storage systems with Swift: all image thumbnails (scaled images such as a 320px version of a picture) are now stored on Swift. Our current production Swift cluster is made up of 4 back-end storage nodes with 22TB each and 2 front-end proxy nodes that handle user web requests. This new architecture provides us the scalability and reliability we need going forward.
Over the next few months we will build a second Swift cluster in our Virginia data center, then work on migrating all of the original media over to Swift as well. For more detail on the implementation and plan for Swift, you can read up on the documentation on Wikitech, ask questions in the comments below, or come and visit us in #wikimedia-tech on Freenode in IRC.
Ben Hartshorne
Operations
Wikimedia Foundation

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

4 Comments
Inline Feedbacks
View all comments

[…] imperceptibles pour les utilisateurs, tels que le travail effectué afin de rendre possible la migration vers Swift pour notre plate-forme de stockage des fichiers […]

[…] 위키미디어에서 swift를 이용한 미디어 저장소 확장 […]

[…] 위키미디어에서 swift를 이용한 미디어 저장소 확장 […]

[…] The objective of this experiment is to compare two different storage systems for the cloud (both Swift and Ceph can be used with OpenStack) with an object-based interface, with the intention of evaluating the performance of Ceph with respect to a system – Swift, that is considered to be very mature and counts already many production deployments. Important institutions or companies use Swift for their storage or as a basis on which their storage services are built (wikimedia, Disney, Spilgames, HP …). […]