Vision system for autonomous vehicles watches not just where pedestrians walk, but how

The University of Michigan, well known for its efforts in self-driving car tech, has been working on an improved algorithm for predicting the movements of pedestrians that takes into account not just what they’re doing, but how they’re doing it. This body language could be critical to predicting what a person does next.

Keeping an eye on pedestrians and predicting what they’re going to do is a major part of any autonomous vehicle’s vision system. Understanding that a person is present and where makes a huge difference to how the vehicle can operate — but while some companies advertise that they can see and label people at such and such a range, or under these or those conditions, few if any can or say they can see gestures and posture.

WTF is computer vision?

Such vision algorithms can (though nowadays are unlikely to) be as simple as identifying a human and seeing how many pixels it moves over a few frames, then extrapolating from there. But naturally human movement is a bit more complex than that.

UM’s new system uses the lidar and stereo camera systems to estimate not just a person’s trajectory, but their pose and gait. Pose can indicate whether a person is looking towards or away from the car, or using a cane, or stooped over a phone; gait indicates not just speed but also intention.

Is someone glancing over their shoulder? Maybe they’re going to turn around, or walk into traffic. Are they putting their arms out? Maybe they’re signaling someone (or perhaps the car) to stop. This additional data helps a system predict motion and makes for a more complete set of navigation plans and contingencies.

Importantly, it performs well with only a handful of frames to work with — perhaps comprising a single step and swing of the arm. That’s enough to make a prediction that beats simpler models handily, a critical measure of performance as one cannot assume that a pedestrian will be visible for any more than a few frames between obstructions.

Not too much can be done with this noisy, little-studied data right now but perceiving and cataloguing it is the first step to making it an integral part of an AV’s vision system. You can read the full paper describing the new system in IEEE Robotics and Automation Letters or at Arxiv (PDF).