When id Software’s John Carmack released Doom in 1993, he had no inkling that his gory first-person shooter — one of the first to feature a 3D environment, and easily the most popular at that time — would help spark a revolution in how machines process information.
Six years later, Nvidia released the GeForce 256, the first graphical processing unit (GPU) built specifically to produce 3D graphics for the burgeoning game industry. In the 17 years since, GPUs have become not merely a staple of high-end gaming, which was the original primary reason for their development, but a driving force behind major advances in artificial intelligence (AI).
Thanks to the creation of ever-more powerful GPUs, deep machine learning and neural networks are poised to change almost every aspect of society — from the jobs we pursue and the cars we drive to the diagnosis we receive when we go to the doctor.
In the first (Neural Networks Made Easy) and second (Why the Future of Deep Learning Depends on Good Data) parts of “A Mathless Guide to Neural Networks,” we explained how deep learning works and why data is so important to the success of AI, respectively. In this third installment of the series, we’ll focus on some of the processing developments that helped usher in the deep learning boom of today. For starters, it helps to understand the differences between how GPUs and CPUs work.
GPUs vs CPUs
By now, you’re familiar with the term ‘central processing unit,’ or CPU. It’s the brains inside your computer that crunches code, allowing it to do everything from calculate numbers to play high-def videos to run several programs at once in a seamless way. Remember that endless “Intel Inside” marketing campaign? It was all about the CPU.
But the CPU is not your computer’s only brain. The machine also contains other bits of silicon that are better than the CPU at specific tasks. The most important of these is the Graphical Processing Unit (GPU). These chips are similar to CPUs in some ways, but vastly different in others.
Most modern CPUs have between two and eight “cores” (essentially, mini-CPUs), each of which can handle a different operation at the same time. Any of these cores can handle any operation you throw at them, and they can switch seamlessly between them. That’s why you can watch a video while downloading software and Snapchatting with your BFFs without noticing any hiccups.
Imagine a circus performer who’s juggling a baseball, a bowling pin, an axe, an apple, and an egg. Every so often, he takes a bite of the apple, drops the axe and picks up a flaming torch. That’s a CPU, essentially a jack of all trades that can easily multitask.
By contrast, modern GPUs have thousands of cores, but they’re much simpler in design. Each core can only do one specific thing, but they can all do it at exactly the same time, very quickly, over and over.
The GPU circus performer can only handle bowling pins — but he can juggle 10,000 at a time, whereas a CPU can’t really tackle that many bowling pins since it’s so busy being flexible and multitasking.
That makes GPUs perfect for operations that require large amounts of repetitive functions — like generating the billions of polygons used to create 3D graphics in gaming environments. It also makes them ideal for training neural networks, which must run the same operations over and over again on massive amounts of data.
GPUs at play
GPUs work their magic by doing complex mathematical calculations billions of times every second.
Video game environments are made up of tiny triangles, which are combined in different ways to form the land, sky, mountains, spaceships, and bug-eyed monsters you see on screen. These triangles are made up of different numbers indicating their location within the environment, their angle relative to other triangles, their color, texture, and so on. GPUs crunch these numbers and turn them into pixels on a flat screen display. Every time the screen refreshes or the scene changes, even slightly, the GPU has to do more math to generate new pixels. This is how you end up with the rich 3D gaming environments of Call of Duty or Grand Theft Auto.
For a high-definition display running at 60 frames per second, the GPU must generate 120 million pixels at a time. Even an extremely powerful CPU might take a second or two to draw a single frame. But divvy up the job among thousands of GPU cores, all operating simultaneously, and it happens nearly instantaneously (the process is known as parallelism).
Loosely speaking, it’s the difference between hiring Michelangelo to paint a fresco on your ceiling, or thousands of artisans, each responsible for a single square inch of surface area.
The overwhelming horsepower of GPUs is why, in 2010, the U.S. Air Force was able to build a supercomputer by daisy chaining 1,760 Sony PlayStation 3 gaming consoles. At the time it was the most powerful computer in the US Defense department, yet it was more than 90 percent cheaper to build than a traditional supercomputer and used one-tenth the electricity.
The elephant in the RAM
Using GPUs for image recognition works in reverse. Instead of converting numbers into pictures, the processor converts pictures into numbers.
Let’s say you’ve built a neural network consisting of thousands of GPUs, each of which has thousands of cores — essentially, a supercomputer. You want to teach this supercomputer how to identify an elephant. Utilizing a method known as supervised learning, you’d start by feeding the network hundreds of thousands of images of elephants, taken from every conceivable angle, each labeled “elephant.” The network would map every edge, texture, shape, and color in each image, attempting to identify the mathematical patterns consistent with images carrying that label.
Along the way, you’ll want to throw in images containing no elephants, so the network doesn’t come to that conclusion that everything it will see is an elephant. This helps the network gradually adjust its model and improve its accuracy overall. so the network can adjust its model and improve its accuracy. The network would go through this process successive times with each image, refining its elephant-seeking algorithm with each new pass.
When you show your supercomputer a new image, it would then predict whether the image is in fact a pachyderm. If the neural network gets it wrong, you send it back for more training (a process known as backpropagation). When it stops improving on its ability to recognize images, the training is done.
Here’s the cool part: You haven’t told the network that an elephant has dusky grey skin, a long flexible trunk, a rounded back, and thick legs. You’ve just said, “Here are a bunch of objects called ‘elephant,’ go figure out what they have in common.” In effect, the network taught itself what an elephant looks like.
Weapons of math destruction
One reason GPUs are so good at training neural networks is that they excel at something called matrix multiplication, which means they can take one table of numbers (say, the values of pixels in one part of the image) and multiply it by another table (the values in another part). Because neural networks rely heavily on matrix multiplication, using GPUs slashes the time it takes to train them–from months or weeks to days or even hours, in some cases.
Modern GPUs tend to have a lot of on-board memory, so they can crunch numbers without having to shuttle data back and forth to the computer’s main memory, a computationally expensive task. That makes them faster. They’re also eminently scalable; the more GPUs you throw into the mix, the more calculations they can handle at once. And they’re programmable, so you can teach them to perform different tasks — like handwriting or speech recognition.
How good are GPU-driven neural nets at recognizing things in images? They’re better than people. In 2015, both Google and Microsoft designed deep neural networks that were more accurate than humans at identifying things in images in the annual ImageNet computer vision challenge. Graphics chipmaker nVidia claims to have increased the speed of training neural networks using GPUs by 50X in just three years.
The reason why GPUs have advanced so quickly: money. Worldwide, video games brought in $100 billion last year — more than movies, music, and books combined. The amazing profitability of games allowed for massive investments in research and development into GPUs and other technologies. Last year, Nvidia spent more than $2 billion developing a single GPU, created specifically for use in deep neural nets, while Google and other companies are working on new “Tensor processing units,” which are specifically designed for use with neural networks and can handle even more volume in an efficient way.
This investment is paying off in areas far beyond the twitchy thumbs of teenagers. Google uses GPU-powered neural nets to recognize voice commands in Android and translate foreign street signs in Google Maps. Facebook uses them to recognize the faces of your friends and customize your news feed. Neural nets provide the intelligence inside some driverless cars that allow them to tell a stop sign from a tree. They help diagnosticians ‘see’ the difference between a tumor and healthy tissue in an MRI and to detect early warning signs of cancer. They’re able to locate cracks in the components of nuclear plants.
They’re also pretty good at playing Super Smash Bros.
Some day, discoveries enabled by a GPU-powered deep neural net might just save your life, an impressive, if ironic, by-product of the first-person shooter.