Researchers train robots to see into the future

Robots usually react in real time: something happens, they respond. Now researchers University of California, Berkeley are working on a system that lets robots “imagine the future of their actions” so that they can interact with things they’ve never seen before.

The technology is called visual foresight and it allows “robots to predict what their cameras will see if they perform a particular sequence of movements.”

Write the researchers:

These robotic imaginations are still relatively simple for now – predictions made only several seconds into the future – but they are enough for the robot to figure out how to move objects around on a table without disturbing obstacles. Crucially, the robot can learn to perform these tasks without any help from humans or prior knowledge about physics, its environment or what the objects are. That’s because the visual imagination is learned entirely from scratch from unattended and unsupervised exploration, where the robot plays with objects on a table. After this play phase, the robot builds a predictive model of the world, and can use this model to manipulate new objects that it has not seen before.

“In the same way that we can imagine how our actions will move the objects in our environment, this method can enable a robot to visualize how different behaviors will affect the world around it,” said Sergey Levine, assistant professor at Berkeley’s Department of Electrical Engineeing and Computer Sciences. “This can enable intelligent planning of highly flexible skills in complex real-world situations.”

The system uses convolutional recurrent video prediction to “predict how pixels in an image will move from one frame to the next based on the robot’s actions.” This means that it can play out scenarios before it begins touching or moving objects.

“In that past, robots have learned skills with a human supervisor helping and providing feedback. What makes this work exciting is that the robots can learn a range of visual object manipulation skills entirely on their own,” said Chelsea Finn, a doctoral student in Levine’s lab and inventor of the original DNA model.

The robot needs no special information about its surroundings or any special sensors. A camera is used to analyze the scene and then act accordingly, much as we can predict what will happen if we move objects on a table into each other.

“Children can learn about their world by playing with toys, moving them around, grasping, and so forth. Our aim with this research is to enable a robot to do the same: to learn about how the world works through autonomous interaction,” Levine said. “The capabilities of this robot are still limited, but its skills are learned entirely automatically, and allow it to predict complex physical interactions with objects that it has never seen before by building on previously observed patterns of interaction.”