Google’s “Machine Learning” Data Centers

At a recent presentation at Data Centers Europe, Google’s Joe Kava announced that the company had begun using a neural network to analyze oceans of data it collects about its server farms and to recommend ways to improve them (
Google engineer Jim Gao is largely credited for creating the neural network. Gao is well-acquainted with large projects like this; for years he’s been working on cooling analysis using computational fluid dynamics, which uses monitoring data to create a 3D model within a server room. He believed that not only was it possible to create a model that tracks a broader set of variables – including IT load, weather conditions, and the operations of the cooling towers, water pumps, and heat exchangers that keep Google’s servers cool – but also that Google as a whole could be doing so much more with that data. He was so fascinated by the possibility of using this type of artificial intelligence, that he took a course in machine learning from Stanford University Professor Andrew Ng.
Google DC

The mechanical plant at a Google facility in Oregon. The data center team constantly monitors the performance of heat exchangers as well as other mechanical equipment here (

Putting The “Machine” To Work
What Gao designed works a lot like other examples of machine learning, such as speech recognition: a computer analyzes large amounts of data to recognize patterns and “learn” from them. In a dynamic environment like a data center, it can be difficult for humans to see how all of the variables (IT load, outside air temperature, etc.) interact with each other. One thing computers are good at is seeing the underlying story in the data, so Jim took the information that is gathered during the course of Google’s daily operations and ran it through a model to help make sense of complex interactions that his team may not have otherwise noticed (
Neural networks mimic how the human brain works and allow computers to adapt and learn tasks without being explicitly programmed for them. Google’s search engine is often cited as an example of this type of machine learning, which is also a key research focus at the company (
In this particular case, computers crunch all available data (IT load, temperature, etc.) while analyzing the interplay that may, in fact, be impossible for a human mind to grasp, finally predicting Power Usage Effectiveness (PUE) or how to use available power most efficiently for maximum computing return.

A simplified version of what the machine models do: analyze data, find the hidden interactions, and provide recommendations that optimize energy efficiency (

The Results
After some trial and error, Jim’s models are now 99.6 percent accurate in predicting PUE. What this means is that he can use the models to come up with new ways to squeeze more efficiency out of Google’s operations.
For example, a couple months ago Google was forced to take some servers offline for a few days, which would normally make that data center less energy efficient. However, they were able to use Jim’s models to change the cooling setup temporarily, which reduced the impact of the change on Google’s PUE for that specific time period. Small tweaks like this, on an ongoing basis, allow for the maintaining of high levels of output, while saving time, energy, and money (,
The Rise Of The Machines?
Kava said that the tool may help Google run simulations and refine future designs. But not to worry, Google’s data centers won’t become self-aware anytime soon. While the company is keen on automation, and has recently been acquiring robotics firms, the new machine learning tools won’t be taking over the management of any of its data centers.
“You still need humans to make good judgments about these things,” said Kava. “I still want our engineers to review the recommendations.”
The neural networks’ biggest benefits may be seen in the way Google builds its server farms in years to come. “I can envision using this during the data center design cycle,” said Kava. “You can use it as a forward-looking tool to test design changes and innovations. I know that we’re going to find more use cases” (