Our primary router for the pmtpa cluster had to be rebooted today at 12:00 GMT. A line card had died and needed replacing, and the
system required a reboot for it to fully take effect. Once that finished, CentralNotice was adding a lot of overhead and had to be disabled for our caching cluster to catch up. Then the overload caused the primary database master for S3 to overload, and we are in the process of switching database masters to another server.
If all went as planned, this would have been a quick 5 minute router reboot and back online. Unfortunately, things do not always work smoothly, so what would have been 5 minutes has been awhile. This post will be updated as more details are resolved.
Update: We have switched database masters successfully and all sites and projects should once again be fully functional as of 14:13 GMT.
Rob Halsell, Operations Engineer