News agencies today are reporting that pop star Michael Jackson has been hospitalized, and perhaps died. We can all think back on how the King of Pop has touched our lives, but today we can also see how high-profile news events can affect a web site… See also past events such as the Popedotting and the 2008 US election.
Here at the office we first noticed something was going on when IM services such as AOL Instant Messenger started logging people out — we quickly noticed that our own servers were hitting load spikes, and suspected there was something going on…
Server CPU load spike (likely several more to come):
The actual traffic load spike is subtler; server effects can be disproportionate to the actual traffic:
Update 22:53 UTC:
The traffic is pretty much holding steady but we’ve still been seeing intermittent load spikes:
These are at least in part due to one of our memcached internal data cache servers going wonky and swapping due to overuse of memory from text storage running on the same node. We’ve reduced traffic on the node and restarted it to even out its memory usage. (Thanks Domas!)
Update 23:00 UTC:
You may see intermittent messages like “(Cannot contact the database server: Unknown error (10.0.6.24))” as temporary database overloads cascade around the system. Sorry for the inconvenience while we work the kinks out; just wait a few minutes and try again…
Update 23:43 UTC:
We believe a large chunk of the CPU overload is due to cache swarming — many visitors simultaneously causing a re-render of the page due to an expired cache version. I’ve put in a temporary hack which will reduce the amount of rendering, but may cause some people to see out of date copies of the page.
Here’s a link to Domas’s blog post with technical details on the cash swarming problem.