Nobody notices when it's not broken: New database servers deployed

18 November 2011 by Ben Hartshorne

The Technical Operations team has just completed behind-the-scenes work that will likely never be noticed by our readers.
Our External Storage databases hold the text for every version of every wiki page; they have slowly grown over the life of Wikipedia and its sister projects. Ten years is a lifetime on the Internet, and the incremental changes that were made to our external storage system over that period, though appropriate at the time they were made, resulted in a setup that was a challenge to maintain and which was becoming unreliable.

We spent a few weeks analyzing all the various servers across which the page text data was spread, in order to gather it all together onto a single host. From there, it could be replicated onto newer, more reliable and higher performance hardware. Along the way, we found and fixed a number of inconsistencies to make the dataset more regular.

The deployment of the new hardware lasted a few days (as we moved things piece by piece) and was finished this past Monday with no fanfare. There was a brief (about 10 minute) period during which articles were unable to be edited while we switched writes to the new hardware. The end result is a barrage of small improvements, all of which together make for a happy TechOps team:

average query duration has dropped from about 15ms to around 8ms and the worst case from 576ms down to 60ms;
replication and failover processes are now well known and standardized;
total hardware used has dropped from around 30 servers to 8, now in two locations;
hosts no longer double up as web servers and database servers for text; dedicated servers are used for the database.

It’s a small victory in the battle against entropy, but an important prerequisite for carrying out our mission of providing unfettered and reliable access to the sum of all knowledge.
Ben Hartshorne, Operations Engineer

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

Diff

Nobody notices when it's not broken: New database servers deployed

Can you help us translate this article?

Related

Welcome to Diff

Subscribe to Diff via Email

Wiki Loves Africa – Africa Creates

#EveryBookItsReader

🎈 Let’s Connect Learning Clinic: Introduction to Wikipedia Library: Product and Partnerships

Wikimedia Foundation News

Wikimedia Technology Blog

Down the Rabbit Hole

	This comment is spam
	This comment is a violation of the Code of Conduct
	Other