MIT CSAIL teaches a robot to follow contextual voice commands

MIT’s Computer Science and Artificial Intelligence Lab has devised a method by which robots can understand and respond to voice commands, stated in clear, plain language. The system is advanced enough to understand contextual commands, too, including references made to previously mentioned commands and objects.

The so-called ComText system (short for “commands in context”) created by CSAIL researchers provides “Alexa-like” vocal control of robots, which can demonstrate a contextual understanding not just of previous commands, but also of the objects they interact with and their surrounding environment.

This all adds up to a robot users can interact with in the same way they might interact with another person; interfacing with robots is clearly a big challenge, and a potentially huge barrier to their commercial introduction and use in general consumer-facing applications. Even in industry, it would be far easier to have humans and robots working together if they understood natural language voice commands.

ComText has been demonstrated to work by learning designations for certain objects, so you can for instance tell it that “this tool I’m holding is my tool,” and in future whenever you say something like “hand me my tool,” it’ll find the right one and retrieve it. The ComText system was tested by researchers using a Baxter model, which is a two-armed, essentially humanoid bot created by Rethink Robotics.

ComText is made possible because it has different kinds of memory, including both semantic memory, which covers general info, and episodic memory, which is tied to specific occurrences and events. In test, the robot was able to do the right thin in response to tester voice commands around 90 percent of the time, which is remarkable, and the team hopes to push the limits using more complex input, including multi-step commands and a deepening of the robot’s contextual knowledge.