Robots learn to perform chores by watching YouTube

Learning has been a holy grail in robotics for decades. If these systems are going to thrive in unpredictable environments, they’ll need to do more than just respond to programming — they’ll need to adapt and learn. What’s become clear the more I read and speak with experts is true robotic learning will require a combination of many solutions.

Video is an intriguing solution that’s been the centerpiece of a lot of recent work in the space. Roughly this time last year, we highlighted WHIRL (in-the-Wild Human Imitating Robot Learning), a CMU-developed algorithm designed to train robotic systems by watching a recording of a human executing a task.

This week, CMU Robotics Institute assistant professor Deepak Pathak is showcasing VRB (Vision-Robotics Bridge), an evolution to WHIRL. As with its predecessor, the system uses video of a human to demonstrate the task, but the update no longer requires them to execute in a setting identical to the one in which the robot will operate.

“We were able to take robots around campus and do all sorts of tasks,” PhD student Shikhar Bahl notes in a statement. “Robots can use this model to curiously explore the world around them. Instead of just flailing its arms, a robot can be more direct with how it interacts.”

The robot is watching for a few key pieces of information, including contact points and trajectory. The team uses opening a drawer as an example. The contact point is the handle and the trajectory is the direction in which it opens. “After watching several videos of humans opening drawers,” CMU notes, “the robot can determine how to open any drawer.”

Obviously not all drawers behave the same way. Humans have gotten pretty good at opening drawers, but that doesn’t mean the occasional weirdly built cabinet won’t give us some trouble. One of the key tricks to improving outcomes is making larger datasets for training. CMU is relying on videos from databases like Epic Kitchens and Ego4D, the latter of which has “nearly 4,000 hours of egocentric videos of daily activities from across the world.”

Bahl notes that there’s a massive archive of potential training data waiting to be watched. “We are using these datasets in a new and different way,” the researcher notes. “This work could enable robots to learn from the vast amount of internet and YouTube videos available.”