Waymo and DeepMind mimic evolution to develop a new, better way to train self-driving AI

Alphabet’s autonomous driving and robotaxi company Waymo does a lot of training in order to refine and improve the artificial intelligence that powers its self-driving software. Recently, it teamed up with fellow Alphabet company and AI specialist DeepMind to develop new training methods that would help make its training better and more efficient.

The two worked together to bring a training method called Population Based Training (PBT for short) to bear on Waymo’s challenge of building better virtual drivers, and the results were impressive — DeepMind says in a blog post that using PBT decreased by 24% false positives in a network that identifies and places boxes around pedestrians, bicyclists and motorcyclists spotted by a Waymo vehicle’s many sensors. Not only that, but is also resulted in savings in terms of both training time and resources, using about 50% of both compared to standard methods that Waymo was using previously.

To step back a little, let’s look at what PBT even is. Basically, it’s a method of training that takes its cues from how Darwinian evolution works. Neural nets essentially work by trying something and then measuring those results against some kind of standard to see if their attempt is more “right” or more “wrong” based on the desired outcome. In the training methods that Waymo was using, they’d have multiple neural nets working independently on the same task, all with varied degrees of what’s known as a “learning rate,” or the degree to which they can deviate in their approach each time they attempt a task (like identifying objects in an image, for instance). A higher learning rate means much more variety in terms of the quality of the outcome, but that swings both ways — a lower learning rate means much steadier progress, but a low likelihood of getting big positive jumps in performance.

But all that comparative training requires a huge amount of resources, and sorting the good from the bad in terms of which are working out relies on either the gut feeling of individual engineers, or massive-scale search with a manual component involved where engineers “weed out” the worst performing neural nets to free up processing capabilities for better ones.

What DeepMind and Waymo did with this experiment was essentially automate that weeding, automatically killing the “bad” training and replacing them with better-performing spin-offs of the best-in-class networks running the task. That’s where evolution comes in, since it’s kind of a process of artificial natural selection. Yes, that does make sense — read it again.

In order to avoid potential pitfalls with this method, DeepMind tweaked some aspects after early research, including evaluating models on fast, 15-minute intervals, building out strong validation criteria and example sets to ensure that tests really were building better-performing neural nets for the real world, and not just good pattern-recognition engines for the specific data they’d been fed.

Finally, the companies also developed a sort of “island population” approach by building sub-populations of neural nets that only competed with one another in limited groups, similar to how animal populations cut off from larger groups (i.e. limited to islands) develop far different and sometimes better-adapted characteristics versus their large land-mass cousins.

Overall, it’s a super interesting look at how deep learning and artificial intelligence can have a real impact on technology that already is, in some cases, and will soon be even much more, involved in our daily lives.