From recording to reacting: Neural networks are changing notions of surveillance

There are an estimated 30 million surveillance cameras in the U.S. today. Out of these 30 million cameras, only 5 percent are monitored by a human at any given time. Instead, the majority of them are simply recording footage, providing little value other than evidence long after any kind of crime or accident has occurred.

What this means is that today’s “security” systems are mostly just a vast network of evidence collection devices, constantly recording and dumping data into hard drives, only to be retrieved after something regrettable has happened. Ninety-five percent of all security cameras offer no real-time benefits — there simply aren’t enough eyeballs to go around.

What if we could put a pair of eyes onto every security camera in the U.S.?

Imagine if we could hire one person to monitor every camera in the United States. Rather than having a video recording network, we would have a vast real-time response system, able to alert emergency services to potential accidents before they happen, or stop a crime in progress.

A real-time network could cross-reference multiple feeds at the same time, potentially exposing highly sophisticated planned crimes and terrorist attacks similar to those that various countries have suffered in recent years.

Making use of these 30 million installed cameras in a real-time system would make them far more useful than the current mass-recording approach that is their current primary function.

More than four billion hours of surveillance footage is generated each week in the U.S.

If we wanted to monitor this network in real time, it would require employing more than 90 million Americans for the task — roughly half the country’s entire workforce. Clearly this is not a viable solution, but a new generation of smart cameras with embedded neural networks will soon be able to act on this real-time video data as if we really did have a 90 million-strong staff of security personnel operating around the clock.

Neural networks are the result of research in the field of machine intelligence (a field of computing that seeks to bring about more natural types of intelligence in our devices). Rather than simply the “number smarts” that the traditional computer is known for, neural network approaches appear to be very good at enabling computers to understand natural, unprepared and often non-uniform information of the world around them, such as the contents of a video, or the topics and themes of a conversation — things we take for granted, but are tremendously challenging for a computer to understand.

Technology has the opportunity to improve these systems not by adding more cameras or recording more footage, but by making better use of the information in real time.

Rather than video being locked away and retrieved after a crime has happened, an intelligent system monitoring feeds in real time could detect accidents and crimes as they are about to occur (or at least while in progress). Airports, bank and schools could significantly reduce response times for emergency services, possibly even prevent accidents and crime before they happen. How would a machine intelligence system do such a thing?

Thanks to new approaches such as deep neural networks, we’re able to create extremely sophisticated detection systems that not only detect humans in a video feed (very useful for pedestrian avoidance), but also understand complex layers of information, such as behavior and body language.

Detection of behaviors is extremely valuable and can dramatically improve security systems. For example, Google has demonstrated success in detecting human body pose in images. Imagine a situation in a bank where silent alarms can be triggered not with the press of a button but automatically when the smart camera system detects aggressive body postures, or running motions. The ability for these new systems to understand, flag or even respond to events makes security response more timely, and less labor intensive at the same time.

Google Deep Pose

Google’s “DeepPose” Pose estimation using deep learning.

Furthermore, facial recognition can rapidly separate authorized personnel from intruders, or match faces from multiple camera sources in order to track from location to location. Researchers are even exploring systems that can detect the presence of concealed firearms or explosives based on the gait of an individual, or even their radar signature.

Do smart cameras bring privacy concerns?

Privacy is a major concern that comes up in these discussions. Of course, surveillance is a controversial topic that should not be swept under the rug. That being said, society appears to have come to the consensus that security cameras do belong in places such as airports, hospitals, banks and schools. These public spaces are already being constantly recorded, yet with little actual real-time responsiveness available.

Technology has the opportunity to improve these systems not by adding more cameras or recording more footage, but by making better use of the information in real time.

Deploying intelligence

Companies are already leveraging massive amounts of cloud computing for the newest generation of smart cameras, but we are also beginning to see new levels of intelligence in the cameras themselves. Computation at the sensor, rather than in the cloud, massively reduces movement of data, and eases the complexity and cost of deploying connected devices in bandwidth-constrained areas.

What were a handful of research papers a decade ago are now active product development initiatives for security camera companies. What does that mean for the future of security? It means a world where cameras contribute to real-time safety, and it’s not imprudent to imagine these systems will eventually be preventing crimes before they actually begin.