Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

PMTPA Router Reboot – Scheduled Downtime (Resolved)

Our primary router for the pmtpa cluster had to be rebooted today at 12:00 GMT.  A line card had died and needed replacing, and the

120px-Gnome-face-sick.svg

system required a reboot for it to fully take effect.  Once that finished, CentralNotice was adding a lot of overhead and had to be disabled for our caching cluster to catch up.  Then the overload caused the primary database master for S3 to overload, and we are in the process of switching database masters to another server.

If all went as planned, this would have been a quick 5 minute router reboot and back online.  Unfortunately, things do not always work smoothly, so what would have been 5 minutes has been awhile.  This post will be updated as more details are resolved.

Update: We have switched database masters successfully and all sites and projects should once again be fully functional as of 14:13 GMT.

Rob Halsell, Operations Engineer

One Response to “PMTPA Router Reboot – Scheduled Downtime (Resolved)”

  1. As tweeted today:

    Thanx to all wikipedia systema administrators (and developers) on System Administrator Appreciation Day today.

    best wishes from Germany, Achim

Leave a Reply