We’re adding an off site archive for Commons and the XML snapshots

Thanks are due to eBart consulting and User:Milosh for proving a backup server and storage array at their colocation facility in Europe. This server will store archives of our publicly available data of Wikimedia Commons and the XML snapshots.

Everyone knows that this has been long and coming as having an off site location for our data is extremely important for disaster recovery. With this archive in place we’ll have another external archive space for Commons image data to complement the one living at MIT.

Given the 10T’s donated were likely to also store yearly archives of the XML snapshots.

This won’t stop us from continuing to be rigorous about our internal backups for the same data along with keeping all of our users private data within our own data centers. It will simply be another physical space for us to archive our publicly available content.

While this off line mirror will only be used internally we have some other leads about other sponsors who might be able to offer a publicly available mirror. Over the next weeks we’ll be streamlining the off line archiving process and seeding the initial commons upload which currently comes in at just under 4T’s ! Once we make some sense of how best to manage the archiving process we’ll see who else is able to host our data.

Categories: Operations, Technology
Tags: ,
11 Show

11 Comments on We’re adding an off site archive for Commons and the XML snapshots

草窟主人 7 years

谁能告诉我怎么设置 wikipedia 的 URL Rewrite 功能啊?

Tomasz 7 years

Looks like Mark already got to this and we have reports on Village Pump that everything is working fine again.


iiuniveadds 7 years

Hi. Great stuff. I actually stumbled upon your blog accidentally but was amazed by its content. The information was really good and i would love to come back often. shall I give link to your blog from my blog. Same thing is expected from your side for good colleboration & mutual gain.

CopperKettle 7 years

I hope there will be some reaction, third day started with many people having crippled access to Wiki.

CopperKettle 7 years

Here’s the discussion at the local portal, users of several Yekt. providers have almost no access to Wikipedia for two days:


There are tracert’s included..

CopperKettle 7 years

Truly so! I’m from Yekaterinburg, Russia, and Wikipedia is very slow for two days already, with no pictures loading at all. Up to 2 minutes to open any single page.

ifilippov 7 years

I don’t know where to write it, so put it as a comment to the most recent news.

A lot of users in Russia and Belarus currently cannot access to local wikipedias. The confirmed reports are from Minsk, Yekaterinburg, Zlatoust, Yuruzan, Izhevsk.

The typical tracert is the following:
1 9 ms 9 ms 10 ms
2 10 ms 12 ms 11 ms
3 10 ms 9 ms 11 ms
4 11 ms 10 ms 11 ms
5 68 ms 195 ms 19 ms
6 20 ms 13 ms 12 ms
7 42 ms 43 ms 43 ms msk01-xe-0-2-0-74.inet.euro-tel.ru []
8 43 ms 45 ms 45 ms m9-cr02-Te4-8.msk.stream-internet.net []
9 99 ms 98 ms 98 ms tct-cr01-te5-1.ams.stream-internet.net []
10 101 ms 103 ms 103 ms ams-ix.2ge-2-1.br1-knams.wikimedia.org []
11 * * * Превышен интервал ожидания для запроса .

It is possible to login via secured server only. At the same time some users do not experience such problems.

However, the general slowdown is noted by many Russians. There are several discussions on the problems with uploads of images.

The problems started 2 days ago.

simonfj 7 years

“Once we make some sense of how best to manage the archiving process we’ll see who else is able to host our data”.

I’d better give you the heads up. You might call it making “some sense of how best to manage the archives”, but in the world of archivists, they’re called curators.

Watch this space. http://www.wikimedia.org.au/wiki/GLAM

Mdupont 7 years

great news!
This is exciting.
Now all we need is some grid/p2p architecture for distributing and processing it.

Erik Zachte 7 years

Music to my ears.

Now how to bribe Murphy for the next couple of weeks ;-)

Leave a Reply

Your email address will not be published. Required fields are marked *