Here's how Uber's self-driving cars are supposed to detect pedestrians

A self-driving vehicle made by Uber has struck and killed a pedestrian. It’s the first such incident and will certainly be scrutinized like no other autonomous vehicle interaction in the past. But on the face of it it’s hard to understand how, short of a total system failure, this could happen, when the entire car has essentially been designed around preventing exactly this situation from occurring.

Something unexpectedly entering the vehicle’s path is pretty much the first emergency event that autonomous car engineers look at. The situation could be many things — a stopped car, a deer, a pedestrian — and the systems are one and all designed to detect them as early as possible, identify them and take appropriate action. That could be slowing, stopping, swerving, anything.

Uber’s vehicles are equipped with several different imaging systems which work both ordinary duty (monitoring nearby cars, signs and lane markings) and extraordinary duty like that just described. No less than four different ones should have picked up the victim in this case.

Top-mounted lidar. The bucket-shaped item on top of these cars is a lidar, or light detection and ranging, system that produces a 3D image of the car’s surroundings multiple times per second. Using infrared laser pulses that bounce off objects and return to the sensor, lidar can detect static and moving objects in considerable detail, day or night.

This is an example of a lidar-created imagery, though not specifically what the Uber vehicle would have seen.

Heavy snow and fog can obscure a lidar’s lasers, and its accuracy decreases with range, but for anything from a few feet to a few hundred feet, it’s an invaluable imaging tool and one that is found on practically every self-driving car.

The lidar unit, if operating correctly, should have been able to make out the person in question, if they were not totally obscured, while they were still more than a hundred feet away, and passed on their presence to the “brain” that collates the imagery.

Front-mounted radar. Radar, like lidar, sends out a signal and waits for it to bounce back, but it uses radio waves instead of light. This makes it more resistant to interference, since radio can pass through snow and fog, but also lowers its resolution and changes its range profile.

Tesla’s Autopilot relies mostly on radar.

Depending on the radar unit Uber employed — likely multiple in both front and back to provide 360 degrees of coverage — the range could differ considerably. If it’s meant to complement the lidar, chances are it overlaps considerably, but is built more to identify other cars and larger obstacles.

The radar signature of a person is not nearly so recognizable, but it’s very likely they would have at least shown up, confirming what the lidar detected.

Short and long-range optical cameras. Lidar and radar are great for locating shapes, but they’re no good for reading signs, figuring out what color something is and so on. That’s a job for visible-light cameras with sophisticated computer vision algorithms running in real time on their imagery.

The cameras on the Uber vehicle watch for telltale patterns that indicate braking vehicles (sudden red lights), traffic lights, crossing pedestrians and so on. Especially on the front end of the car, multiple angles and types of camera would be used, so as to get a complete picture of the scene into which the car is driving.

Detecting people is one of the most commonly attempted computer vision problems, and the algorithms that do it have gotten quite good. “Segmenting” an image, as it’s often called, generally also involves identifying things like signs, trees, sidewalks and more.

That said, it can be hard at night. But that’s an obvious problem, the answer to which is the previous two systems, which work night and day. Even in pitch darkness, a person wearing all black would show up on lidar and radar, warning the car that it should perhaps slow and be ready to see that person in the headlights. That’s probably why a night-vision system isn’t commonly found in self-driving vehicles (I can’t be sure there isn’t one on the Uber car, but it seems unlikely).

Safety driver. It may sound cynical to refer to a person as a system, but the safety drivers in these cars are very much acting in the capacity of an all-purpose failsafe. People are very good at detecting things, even though we don’t have lasers coming out of our eyes. And our reaction times aren’t the best, but if it’s clear that the car isn’t going to respond, or has responded wrongly, a trained safety driver will react correctly.

Worth mentioning is that there is also a central computing unit that takes the input from these sources and creates its own more complete representation of the world around the car. A person may disappear behind a car in front of the system’s sensors, for instance, and no longer be visible for a second or two, but that doesn’t mean they ceased existing. This goes beyond simple object recognition and begins to bring in broader concepts of intelligence such as object permanence, predicting actions and the like.

It’s also arguably the most advanced and closely guarded part of any self-driving car system and so is kept well under wraps.

It isn’t clear what the circumstances were under which this tragedy played out, but the car was certainly equipped with technology that was intended to, and should have, detected the person and caused the car to react appropriately. Furthermore, if one system didn’t work, another should have sufficed — multiple failbacks are only practical in high-stakes matters like driving on public roads.

We’ll know more as Uber, local law enforcement, federal authorities and others investigate the accident.

Here’s how Uber’s self-driving cars are supposed to detect pedestrians

Redundant, overlapping vision systems