Baidu’s AI team taught a virtual agent just like a human would their baby

Baidu’s artificial intelligence research team has achieved a significant milestone: teaching a virtual agent “living” in a 2D environment how to navigate its world using natural language commands, by first teaching it language through positive and negative reinforcement. The especially exciting thing, according to the scientists, is that the agent ended up developing a “zero-shot learning ability,” which essentially means that the AI agent developed a basic sense of grammar.

You probably don’t remember it from personal experience because it happened when you were a baby, but this is basically how parents teach their kids when very young. You show them images, repeat words and, eventually, with enough positive reinforcement, the kid can associate those words with those images and voilà — it knows the names of things.

Baidu’s big breakthrough, though, is that the agent within its system can apply to new situations commands it’s learned — computers aren’t great at taking knowledge acquired before and applying it to new things. Here’s the explanation direct from Baidu’s own research team of why and how its system is different:

Applying past knowledge to a new task is very easy for humans but still difficult for current end-to-end learning machines. Although machines may know what a “dragon fruit” looks like, they can’t perform the task “cut the dragon fruit with a knife” unless they have been explicitly trained with the dataset containing this command. By contrast, our agent demonstrated the ability to transfer what they knew about the visual appearance of a dragon fruit as well as the task of “cut X with a knife” successfully, without explicitly being trained to perform “cut the dragon fruit with a knife.

The ability to generalize use of a skill used previously is no small feat for artificial intelligences. It’s a model that shows systems can learn and apply retained knowledge in a manner similar to humans, even if only in the limited realm of a simplistic, video game-like 2D environment. Which is very unlike me, in fact: I’ll never learn.