Wikimedia site outage, 6 August, 2012

Translate This Post

Wikimedia sites experienced an outage today that started at about 6:15am PDT (13:15 UTC). Except for the mobile site, the sites were brought back up by 7:18am PDT (14:18 UTC). Mobile site services resumed at about 8:35 am PDT (15:35 UTC).
At about 6:15am PDT, we were alerted to a site issue and our team found severed network connectivity between our two data centers. Upon checking with our network provider, they informed us that the outage was caused by a fiber cut between the two data centers.
The data centers — one in Ashburn, Virginia and the other in Tampa, Florida — are connected by two separate fiber links (for redundancy). While Ashburn serves most of the traffic, it needs to talk to our Tampa data center for backend services (e.g. database).
We do operate two 10-g separate fibers between the data centers. We are now working with our network provider to determine how and why we were impacted by that fiber cut when we are supposed to have redundancy in our network. We are still waiting for their full report.
The team worked around the outage by rerouting traffic to Tampa, bypassing the Ashburn site. Connectivity was restored at about 8:35am PDT to one of the provider’s network links. The second link was restored at about 11:30am PDT (18:30 UTC). However, we have not reverted traffic back to Ashburn yet until we are comfortable with their fix. The switch back to Ashburn from Tampa should not be apparent to users.
UPDATE: Expanded report posted here: http://wikitech.wikimedia.org/view/Site_issue_Aug_6_2012
Please see status.wikimedia.org for site availability.

 CT Woo, Director of Technical Operations

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?

10 Comments
Inline Feedbacks
View all comments

[…] severing the cables that connected the site’s servers to each other and the rest of the world.According to the Wikimedia foundation, the disruption in service was noticed somewhere around 6:15 am PDT, and the result of […]

[…] resmî bloğunda da konuyla alakalı bir yazıya yer verdi, okumak için burayı tıklayın. PaylaşLike this:BeğenBe the first to like […]

[…] According to the Wikimedia foundation, the disruption in service was noticed somewhere around 6:15 am PDT, and the result of “severed network connectivity” between Wikipedia’s two data centers, headquartered in Tampa, Florida, and Ashburn, Virginia. “Upon checking with our network provider, they informed us that the outage was caused by a fiber cut between the two data centers,” the foundation reported, adding that service was restored to all but the mobile site by 7:18 am PDT, with full service back by 8:35 am. […]

[…] sul blog della Wikimedia Foundation hanno pubblicato un post su quello che è successo. Share this:FacebookTwitterLike this:LikeBe the first to like this. […]

An update with root cause is now available – http://wikitech.wikimedia.org/view/Site_issue_Aug_6_2012 ,
CT Woo

[…] the ways you can access offline Wikipedia free of cost or feel free to get WikiReader. Recently, Wikipedia was down which made me noticed this […]

With a budget of over 20 millions of dollars coming from the fundraiser this is pretty much unacceptable. 90% of the money should go to technology aspects like this. Outage is not an option.

@jolison (comment #2): We all agree that any outage is unacceptable. As of December 2011, the average uptime was 99.97% though, so it seems the sysadmins are actually doing a great job. We can try as hard as hard as we can to minimize downtime—all sites do that—but unfortunately sometimes there are circumstances beyond the system administrators’ control. As CT said, the Foundation is paying to have redundant cables that would prevent errors like this, so the amount of money allocated to tech isn’t at issue here. It seems the issue was with the third-party company that manages the datacenter,… Read more »

[…] site outage caused by a fiber cut between our two data […]

[…] Woo, director of technical operations for Wikipedia, wrote in a blog posting that Websites on the Wikimedia network experienced an outage Aug. 6 starting at 6:15 AM PDT. Except […]