Facebook Explains Outage

Chris CrumSocial Media

Share this Post

Facebook was down for over 2.5 hours for some users, according to a post from the company. A post in Facebook's engineering notes says:

The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition. An automated system for verifying configuration values ended up causing much more damage than it fixed.

The intent of the automated system is to check for configuration values that are invalid in the cache and replace them with updated values from the persistent store. This works well for a transient problem with the cache, but it doesn’t work when the persistent store is invalid.

You can see more of the technical details here. Facebook has turned off the system that attempts to correct configuration values, and is exploring new designs for it. 

I attended a screening of the movie The Social Network last night, and Mark Zuckerberg's character stressed how much downtime would hurt the reputation of the site, as he was getting it launched. I thought that was kind of funny, considering the timing. 

Chris Crum
Chris Crum has been a part of the WebProNews team and the iEntry Network of B2B Publications since 2003. Follow Chris on Twitter, on StumbleUpon, on Pinterest and/or on Google: +Chris Crum.