These so-called Tensor Processing Units (TPU) are custom-built chips that Google has now been using in its own data centers for almost a year, as Google’s senior VP for its technical infrastructure Urs Holzle noted in a press conference at I/O. Google says it’s getting “an order of magnitude better-optimized performance per watt for machine learning” and argues that this is “roughly equivalent to fast-forwarding technology about seven years into the future.”
Google also manages to speed up the machine learning algorithms with the TPUs because it doesn’t need the high-precision of standard CPUs and GPUs. Instead of 32-bit precision, the algorithms happily run with a reduced precision of 8 bits, so every transaction needs fewer transistors.
If you are using Google’s voice recognition services, your queries are already running on these TPUs today — and if you’re a developer, Google’s Cloud Machine Learning services also run on these chips. AlphaGo, which recently beat the Go world champion, also ran on TPUs.
Holzle said that Google decided to build these application-specific chips instead of using more flexible FPGAs because it was looking for efficiency.
Holzle sadly didn’t want to disclose which foundry is actually making Google’s chips, though he was willing to say that the company is currently using two different revisions in production and that they are manufactured by two different foundries.
With TensorFlow, Google offers its own open source machine learning library and unsurprisingly, the company adopted it to run its algorithms on TPUs.