What your security scientists can learn from your data scientists to improve cybersecurity

Michael Schiebel Contributor

Michael Schiebel is GM of Cybersecurity Industry at Hortonworks.

Security remains one of the top unresolved challenges for businesses. Billions of dollars have been spent on security technology over the last 30 years, yet hackers seem to be more successful than ever. Every organization is now under extreme threat, all the time.

Today, hacking is a much more complex art than it used to be: It no longer only involves just scanning and penetrating the network via a vulnerability. Yet the traditional security tools used by most companies are often inadequate because they still focus on this, ignoring what is now a very complex post-compromise chain of events.

Most tools are still role-based, with signatures, detection and response rules. That’s their downfall. Modern complex attack chains have many more phases, from reconnaissance to exploitation, elevation of privileges, internal horizontal spread, exfiltration of data and access persistence over time.

Because much of the innovation in big data and security is now in the open source world, here are some lessons that data scientists have learned and to which security professionals need to pay attention.

Focus on the abnormalities

Data science is all about creating structure with unstructured data and labeling it so you can compare normal versus abnormal patterns via machine or deep learning algorithms. Whether it’s clickstream advertising, buyer sentiment analysis, facial recognition algorithms, predicting a pandemic virus or modeling the spread of malware through a network, it’s the same basic data science. What changes is the type of pattern you detect.

Too many security tools are built in a test tube.

Take customer sentiment analysis, for example, where we are looking for the “normal” buying behaviors of our customers. How do they interact with us? What is normal? It’s all about ignoring the abnormalities and the edge cases and being able to classify the types of normal behavior.

Now, what if I wanted to understand online credit card fraud for the same set of buyers? I’d want to focus on the abnormalities. It’s the same data, same techniques and the same analytical models, but you would choose to focus on the outliers versus the normal. So, it’s crucially important that a security vendor and security professional use the same data and essentially the same algorithm, just with an alternative focus.

Use ALL the data

This is a fundamental thing you learn as a data scientist that may not be not obvious for security professionals. For a security solution that detects ALL behavioral changes, you must run your machine learning algorithms against raw activity, not just a pre-filtered event stream.

You cannot build analytical models and a behavioral profile to distinguish abnormal behaviors if you are not able to detect raw behavior in the first place. If you plan to perform security analytics from the alerts generated in a traditional security product, you are on the wrong track. Too many security tools are built in a test tube. As such, it’s important to look at how security analytics solutions collect data, what they are collecting and whether they provide a true raw unfiltered feed of activity — both at rest and in motion.

Automate, automate, automate

The real problem in most organizations is that too much security alert data is coming in too fast. Incident response teams are too small and too overwhelmed to effectively monitor, triage and address security incidents.

I have spoken with companies generating hundreds of thousands of alerts per second. But let’s say, hypothetically, that a large company only generates 100,000 alerts per day. A four-person incident response team spending an average of 30 minutes per case would still only be able to triage 16 alerts per day per person — or a whopping 64 incidents per team, daily. The vast majority of alerts would remain unexamined, explaining why compromises can run for an average of 145 days before anyone notices. For all intents and purposes, many organizations are not looking.

How can security teams process a greater percentage of alerts per day? Or prioritize the potentially most severe ones? The answer is by aggressively automating the detection of and response to alerts, reducing a 30-minute process per event down to two.

By focusing on the abnormalities, using all available data and integrating and automating where possible, incident response teams and organizations will be better able to tackle modern complex attack chains.