Amazon EC2 Server Issues Cause Web Havoc

Amazon’s Elastic Compute Cloud (EC2) service has caused a bit of disarray around the web as servers have failed. Among the sites/services affected are Foursquare, Quora, Reddit, and Hootsuite (ht: The Next Web). The issues appear to be coming out…

Twitter Down For Some Users

Twitter users are having some problems accessing the site. What else is new right? We’re not seeing the usual fail whale, but the robot pictured above this time. Various Twitter apps appear to be working (we tested iPhone/Android/Mac), but Twitter.com…

Google Loses Gmail Users’ Email, Says It Will Be Back in Hours

Some Gmail users have had some problems over the last day or so. Email messages have gone missing, along with labels, themes and other personalized settings. Google is working on fixing this. 

While only a small percentage of Google users was affected, that still accounts for thousands of users –  tens of thousands, according to ComputerWorld, who estimates the number to be about 35,000. 

Google Commits to 99.99% Uptime for Google Apps

Google has announced that it has made some changes to its service level agreement (SLA) for Google Apps, to reduce the possibility that users will experience any downtime. The company says it has eliminated maintenance windows from the SLA, so Google will never plan for users to be down when they’re upgrading services or maintaining their systems. 

Communication Breakdown: When Email Goes Down

At the end of December, some Hotmail users experienced problems with their email – it was gone. Messages and folders went completely missing from their accounts. Luckily, for those users, the emails came back. 

Microsoft says it recovered 100% of email and folders for the accounts affected. Unfortunately, for those who didn’t sign into their accounts between the time of the incident and the time the account was restored, any messages sent to their accounts during that time would have bounced. 

Skype Talks Outage and Prevention of Future Outages

Last week, millions of Skype users lost their connections and experienced various other issues with the service. Though the outage didn’t last much longer than a day, the sheer number of those affected created a huge blunder for the company. 

It didn’t take long for Skype to go on damage control mode, however. CEO Tony Bates himself jumped on the Skype blog a few times to provide updates, explanations, and of course apologies:

Millions Lose Their Skype Connections [Updated]

Update 3: Now CEO Tony Bates has provided further explanation and apologies

Update 2: Parkes provided another update on the issue today:

Update: Facebook is Up…Kind Of (After Downtime)

Update 3: Facebook has now tweeted saying, "Facebook is available again after being down for a brief period. We apologize for the inconvenience."

Facebook Explains Outage

Facebook was down for over 2.5 hours for some users, according to a post from the company. A post in Facebook’s engineering notes says:

The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition. An automated system for verifying configuration values ended up causing much more damage than it fixed.