Last Thursday night a power outage in a Northern Virginia data center caused some websites running on Amazon Web Services (AWS) cloud technology to go down. The issue was resolved within a few hours, and Amazon was quick to provide updates on the situation at its status page for AWS. The situation began at around 9:45 PDT, and the status page displayed the following message for its Amazon ElastiCache in Northern Virginia:
Some Cache Clusters in a single AZ in the US-EAST-1 region are currently unavailable. We are also experiencing increased error rates and latencies for the ElastiCache APIs in the US-EAST-1 Region. We are investigating the issue.
Amazon less than an hour later discovered the source of the problem and posted the message that it was beginning to bring cache clusters back online. One hour after that, it was announced that the service was finally recovering, and at around midnight PDT Amazon announced the situation was resolved:
The service has recovered and is operating normally. All affected Cache Clusters are now back online, API error rates have recovered and customers can once again launch new Cache Clusters. We will follow up with more details of the issue.
The outage led to some sites, such as Quora, Pinterest, and Dropbox, being unavailable or unusable. This is the same type of situation Amazon Web Services faced last year, when another datacenter (in Virginia, no less) suffered the same problem. It sparked many questions about whether cloud platforms were safe for major websites to run on.
Amazon's response to last year's issue was to promise better accountability and transparency. To the company's credit, it seems that the service has improved: last night its status updates came quickly and the situation was resolved in a timely fashion. Amazon Web Services just this week expanded service and lowered prices for its support packages.