Google Compute Engine Customers Lose Data As Lightning Hits Data Center

Chris CrumIT Management

Share this Post

Google reports that it experienced errors with its Google Compute Engine service from last Thursday through Monday as the result of four successive lightning strikes on electrical systems of one of its data centers in Belgium.

According to BBC News, "Some people have permanently lost access to their files as a result."

"Google takes availability very seriously, and the durability of storage is our highest priority," the company said in a status update. "We apologise to all our customers who were affected by this exceptional incident. We have conducted a thorough analysis of the issue, in which we identified several contributory factors across the full range of our hardware and software technology stack, and we are working to improve these to maximise the reliability of GCE's whole storage layer."

While the company says about 5% of the Standard Persistent Disks in the affected zone experienced at least one I/O read or write failure, it says less than 0.000001% of the space of allocated persistent disks remain affected. It's in this small area where customers have lost data.

"This outage is wholly Google's responsibility," the company admits. "However, we would like to take this opportunity to highlight an important reminder for our customers: GCE instances and Persistent Disks within a zone exist in a single Google datacenter and are therefore unavoidably vulnerable to datacenter-scale disasters. Customers who need maximum availability should be prepared to switch their operations to another GCE zone. For maximum durability we recommend GCE snapshots and Google Cloud Storage as resilient, geographically replicated repositories for your data."

This probably shouldn't be considered a reason to avoid the cloud with your data, but perhaps serves as a cautionary tale that it might not be a bad idea to keep your most important data backed up locally. Just in case.

Google says it has an ongoing program for upgrading to storage hardware that is less susceptible to the power failure issue that led to this incident, and that "most" storage is already running on the upgraded hardware.

Image via Thinkstock

Chris Crum
Chris Crum has been a part of the WebProNews team and the iEntry Network of B2B Publications since 2003. Follow Chris on Twitter, on StumbleUpon, on Pinterest and/or on Google: +Chris Crum.