Deep Science: Robots, meet world

Research papers come out far too frequently for anyone to read them all. That’s especially true in the field of machine learning, which now affects (and produces papers in) practically every industry and company. This column aims to collect some of the most relevant recent discoveries and papers — particularly in, but not limited to, artificial intelligence — and explain why they matter.

This edition, we have a lot of items concerned with the interface between AI or robotics and the real world. Of course most applications of this type of technology have real-world applications, but specifically this research is about the inevitable difficulties that occur due to limitations on either side of the real-virtual divide.

One issue that constantly comes up in robotics is how slow things actually go in the real world. Naturally some robots trained on certain tasks can do them with superhuman speed and agility, but for most that’s not the case. They need to check their observations against their virtual model of the world so frequently that tasks like picking up an item and putting it down can take minutes.

What’s especially frustrating about this is that the real world is the best place to train robots, since ultimately they’ll be operating in it. One approach to addressing this is by increasing the value of every hour of real-world testing you do, which is the goal of this project over at Google.

In a rather technical blog post the team describes the challenge of using and integrating data from multiple robots learning and performing multiple tasks. It’s complicated, but they talk about creating a unified process for assigning and evaluating tasks, and adjusting future assignments and evaluations based on that. More intuitively, they create a process by which success at task A improves the robots’ ability to do task B, even if they’re different.

Humans do it — knowing how to throw a ball well gives you a head start on throwing a dart, for instance. Making the most of valuable real-world training is important, and this shows there’s lots more optimization to do there.

Another approach is to improve the quality of simulations so they’re closer to what a robot will encounter when it takes its knowledge to the real world. That’s the goal of the Allen Institute for AI’s THOR training environment and its newest denizen, ManipulaTHOR.

Animated image of a robot navigating a virtual environment and moving items around.

Image Credits: Allen Institute

Simulators like THOR provide an analogue to the real world where an AI can learn basic knowledge like how to navigate a room to find a specific object — a surprisingly difficult task! Simulators balance the need for realism with the computational cost of providing it, and the result is a system where a robot agent can spend thousands of virtual “hours” trying things over and over with no need to plug them in, oil their joints and so on.

ManipulaTHOR gives a physical presence to the robot in the simulator, letting it interact with objects like drawers realistically. If you ask a household robot for a pen, what’s the best way for it to search the office for one? How can it open drawers efficiently without knocking things over? How should it grab the pen and close the drawer afterward? Tasks like these are best attempted via physical simulation, such as within AI2-THOR.

Sometimes the real world is the only source for information, though, such as in evaluating how humans use a prosthesis or exoskeleton. Simulation data won’t do it here — actual use is crucial. This Army Research Laboratory project is looking in particular at how an ankle-supporting “exoboot” could pay attention to more complex body signals to adjust the help it provides.

A soldier walks on a treadmill using a small exoskeleton attached to his boots and knees.

Image Credits: ARL

In a new study, they collected brain and muscle signals as well as tracking motion, with the intention of creating a vocabulary for body states that the boot can recognize quickly and algorithmically — no need for the user to hit the “I’m tired” button or “I’m carrying a heavy pack” switch. Understanding this automatically may be the difference between being a useful tool and a clumsy burden.

A similar question is addressed differently in another ARL project that’s about building conversational models so that soldiers and robots can communicate naturally and efficiently in the field. It’s as important on a battlefield as in a nuclear power plant that an automated system can work alongside people and respond to common requests as well as execute its higher-level goals. The needs of a conversational agent in the field are very different from those on a smartphone or home speaker, into which so much work has gone from Google, Apple, Amazon and others, so much more research is needed.

Robots must be able to operate safely around one another too, and collaborate if necessary. Coordinating a group of 5-10 drones so they don’t crash into each other or the landscape is a difficult and constantly evolving problem. This EPFL study shows that a relatively simple set of rules and observations can give flying drones a good idea not just of how to avoid obstacles and other drones, but how to do so in coordination with those drones by predicting their movements as well.

For instance drone A may have to choose between going to the left or right around an upcoming obstacle, and it can see that it has plenty of room to avoid drone B to its left. But what if drone B has no choice but to go right around its own obstacle? If drone A is not aware of that, it may commit to an overlapping course that leads to delay or a crash if drone B can’t react in time. But if it knows that drone B will soon have to veer right, then that information feeds into its own decision-making and it can go the other way, seeing that even a slightly less efficient path will allow the group to move forward faster collectively.

A different approach can be found in this Georgia Tech study, which looked at how to deploy robots that are “about as dumb as they get,” according to professor Dana Randall, so that they can accomplish complex tasks or those that require teamwork. Their experiment combined real-world observation with simulator work not unlike that described above; they found that the dumb robots, using magnets built into their bodies, naturally form collaborative clumps that can move objects heavier than a single one could — no higher intelligence required. This could be helpful for accomplishing tasks without supervision (in its non-AI sense) and with cheap, minimal robot agents.

Perhaps the biggest collaborative task of all will be the enormous, ever-changing ecosystem of autonomous vehicles moving about a city. One big part of that task is creating a working model of the more (but not totally) static side of the data, the city itself. We’ve seen Google Street View cars and other observation vehicles for years, but EPFL is building something more complete with its ScanVan’s omnidirectional capture gear.

The ScanVan's interesting mirror-based capture tech.

Image Credits: EPFL

“The goal is to take advantage of a device that is able to see the full sphere surrounding it to capture every aspect of the scene within a single shot,” said researcher Nils Hamel. Efficiently capturing and integrating 3D and RGB imagery is a worthwhile endeavor on its own, but the team suggests that the real advance comes in the ability to do so over and over, adding an element of time to the resulting model. What parts of the city see what kinds of changes, either to lighting, population, foliage, traffic, etc., and at what times?

The team also had to reckon with the fact that such data could be used for evil, essentially acting as a universal surveillance tool — so they built it from the ground up to obscure identifying information of individuals and vehicles.