On Christmas Eve, Netflix experienced a widespread outage of its streaming services due to issues with Amazon Web Services. If you tried using Netflix that day, you may have noticed, though not everyone was affected (trust me, I did plenty of Netflix watching over my holiday break). If you were online at all that day, chances are you probably at least saw someone complaining about it or reports about the problem.
Customers were affected in the U.S., Canada and Latin America. Subscribers in the UK, Ireland and Nordic countries were not affected at all.
On Monday, Netflix put a post up on its Tech Blog to explain what happened. Essentially, the disruption in streaming services was impacted by problems with Amazon’s Elastic Load Balancer (ELB), which routes network traffic to Netflix.
Both companies have apologized for the outage, which began about 12:30 PM (Pacific), but “grew in scope” later in the day.
“Netflix uses hundreds of ELBs,” explains Netflix’s Adrian Cockcroft in the blog post. “Each one supports a distinct service or a different version of a service and provides a network address that your Web browser or streaming device calls. Netflix streaming has been implemented on over a thousand different streaming devices over the last few years, and groups of similar devices tend to depend on specific ELBs. Requests from devices are passed by the ELB to the individual servers that run the many parts of the Netflix application. Out of hundreds of ELBs in use by Netflix, a handful failed, losing their ability to pass requests to the servers behind them. None of the other AWS services failed, so our applications continued to respond normally whenever the requests were able to get through.”
The Netflix site itself was up and running throughout the incident, so customers could still access some functionality, and new customers could sign up (Update: Some claim they have been unable to use various site functionalities). According to the company, streaming from Macs and PCs was only slightly reduced from normal.
“Even though Netflix streaming for many devices was impacted, this wasn’t an immediate blackout,” says Cockcroft. “Those devices that were already running Netflix when the ELB problems started were in many cases able to continue playing additional content.”
The company says it is working on ways to extend its resiliency to handle regional outages.
“It is still early days for cloud innovation and there is certainly more to do in terms of building resiliency in the cloud,” says Cockcroft. “In 2012 we started to investigate running Netflix in more than one AWS region and got a better gauge on the complexity and investment needed to make these changes. We have plans to work on this in 2013.”
On a related note, Netflix says it sees higher usage on Christmas Day, with increasing streaming rates until people go back to school and work.