Google on Server Maintenance

    April 11, 2007

Bill Koslosky on his blog reports on a Google talk on the subject of server maintenance*, given by Google’s Luiz Barroso (pictured):

With all the effort to push MHz, little regard has been made for power consumption, to the point where, parts become too power inefficient and temperature management becomes a problem. Also, cost of watts surpasses that of hardware until energy cost becomes the main expense for the data center.

Luiz also talked about predicting hardware failures:

The common wisdom is that disk failures are less than 1% per year, and that temperature increases failure. For a predictive failure model, Google uses SMART (Self Monitoring Analysis and Reporting Technology). This collects signals that may detect bad media surface, bad servo components, electronic/transmission problems, and vibration.

*The talk’s title: “Watts, Faults, and Other Fascinating Dirty Words Computer Architects Can No Longer Afford to Ignore.”

[Thanks Bill! Photo by Bill, with permission.]