In a follow up to Tuesday’s significant disruption of Amazon Web Services (AWS), the company announced yesterday that the reason for the mishap was due to human-error…a typo to be exact.
“The Amazon Simple Storage Service (S3) team was debugging an issue causing the S3 billing system to progress more slowly than expected. At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems.”
Amazon’s announcement continues with more details on the chain reaction that ensued from the simple input error. But what most assumed was a software bug as the reason for the chaos, instead learned that during the well intended debugging process, a “human bug” lay waiting. All of us are understandably familar (sp.) with that one.
And as ZDNet points out: History is littered with typo miscues that led to major tech outages, mix ups and lots of losses. This includes a missing hyphen in speed and trajectory coding that lead to the 1962 explosion of NASA’s Venus Rocket which was estimated to be an $80 million loss (over half a billion in today’s dollars) and a lots of unfortunate poor PR for NASA. On that particular mistake, 2001: A Space Odyssey writer Arthur C. Clarke called it “The most expensive hyphen in history”.
The true dollar loss to AWS and its affected customers, including major publisher and government sites, is unknown. But Amazon will undoubtedly make important changes to their S3 Services, including recovery times of the S3 subsystems, and likely not see a significant drop in their customer base. They are far too much of a major player in the global cloud infrastructure to let a typo be an actual industry changer.