Last week, millions of Skype users lost their connections and experienced various other issues with the service. Though the outage didn’t last much longer than a day, the sheer number of those affected created a huge blunder for the company.
It didn’t take long for Skype to go on damage control mode, however. CEO Tony Bates himself jumped on the Skype blog a few times to provide updates, explanations, and of course apologies:
Now, Skype CIO Lars Rabbe has chimed in with a "post-mortem on the Skype outage". He details the cause of the failure, how they recovered the service, and most importantly, what the company is doing to prevent such a thing from happening again.
If you’re interested in the technical explanation of what happened, simply refer to Rabbe’s post. To put it in the simplest possible terms, which he did, in the intro, Skype’s P2P network became unstable and suffered a "critical failure."
As far as prevention, Rabbe says the company will continue to examine its software for potential issues, and provide "hotfixes" where appropriate, either for download or automatic delivery to users. "We will also be reviewing our processes for providing ‘automatic’ updates to our users so that we can help keep everyone on the latest Skype software," he says. "We believe these measures will reduce the possibility of this type of failure occurring again."
"Second, we are learning the lessons we can from this incident and reviewing our processes and procedures, looking in particular for ways in which we can detect problems more quickly to potentially avoid such outages altogether, and ways to recover the system more rapidly after a failure," he adds. "Third, while our Windows v5 software release was subject to extensive internal testing and months of Beta testing with hundreds of thousands of users, we will be reviewing our testing processes to determine better ways of detecting and avoiding bugs which could affect the system."
He says that Skype will also continue to invest in capacity and resilience, with an investment program already in existence.