We’ve been seeing some general slowdowns in our image and media file serving recently, including some instances in the last couple days where the sites as a whole have been affected to the point of extreme slowness or temporary inaccessibility.
Domas believes this is related to this reported problem with NFS performance when ZFS snapshots are active. We’ve had some luck so far with it improving after dropping older snapshots (possibly along with restarting NFS and temporarily disabling the image scaler servers to give it a little breathing room to reset).
We’ve been planning for some time to redo the way we access our media files internally which can help reduce the impact on the rest of the site when load problems on the file servers occur, but we might also be able to spread out the load among multiple servers to improve things even more.
Updates will come as we get things back on track…
Update 2009-07-15: We’re temporarily shutting off uploads while we apply the ZFS fix patch and reboot the main file server. You may see some missing images or funky error messages for a little bit, but the sites should otherwise continue working normally until the file server is back up.
Update 2: Server is patched and uploads are back online. This should resolve our performance problems while we continue rearranging the upload servers to be more future-proof.
Brion Vibber, Lead Software Architect