Follow The Data Down The Rabbit Hole

Mark Gazit Contributor

Editor’s note: Mark Gazit is one of the top cyber security experts in Israel. He has a long-standing reputation in the field of cyber defense, from his defensive cyber security service in the Israeli Air Force, through to his current role as the CEO of ThetaRay.

When it comes to analyzing big data, often we are urged to “follow the data,” “unlock its secrets,” analyze it and expect obscure truths to finally be revealed. While this is essentially true, data can tell many tales, and the way analytics are used nowadays, those tales greatly differ, depending on the human minds that interpret them.

Who asks the questions?

The question of human bias hangs like a shadow over the accuracy and efficiency of big data analytics, and thus the viability of answers obtained thereof. If different humans can look at the same data and come to different conclusions, just how reliable can those deductions be?

There is no question that using data science to extract knowledge from raw data provides tremendous value and opportunity to organizations in any sector, but the way it is analyzed has crucial bearing on that value.

In order to extract meaningful answers from big data, data scientists must decide which questions to ask of the algorithms. However, as long as humans are the ones asking the questions, they will forever introduce unintentional bias into the equation. Furthermore, the data scientists in charge of choosing the queries are often much less equipped to formulate the “right questions” than the organization’s specialized domain experts.

For example, a compliance manager would ask much better questions about her area than a scientist who has no idea what her day-to-day work entails. The same goes for a CISO or the executive in charge of insider threats. Does this mean that your data team will have to involve more people all the time? And what happens if one of those people leaves the company?

Data science is necessary and important, and as data grows, so does the need for experienced data scientists. But at the same time, leaving all the computational work to humans makes it slower, less scientific, and quick to degrade in quality because the human mind cannot keep up with the quantum leap that big data is undergoing.

The scalability issue

Scalability is an urgent problem for big data and data science that is growing rapidly. According to research by MGI and McKinsey: “The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of Big Data.” Data scientists are already in very short supply, while the amounts of data that organizations generate and wish to leverage only grow as every industry from healthcare to critical infrastructure looks to big data to help them accelerate business and solve problems.

A joint research study by GE and Accenture states that “80-90% of companies across the industries surveyed indicated that big data analytics is either the top priority for the company or in the top three.” Furthermore, “53% of senior executives of industrial companies around the world say Big Data analytics is now a board level initiative.” At this rate, can enough data scientists become qualified and experienced quickly enough to respond to the galloping needs companies already have for real-time analytics? Clearly not. Scalability is already a major issue that needs to be solved—fast.

Luckily, a solution to this issue is already within reach.

Machine learning

Rather than requiring data scientists to analyze and query big data, a wiser and more efficient way to maximize its benefit is to leave the detection phase to machine learning. Data scientists would then merely have to examine and classify the results – anomalies, events and issues that can only be discerned by human eyes.

Due to technological breakthroughs that are already available today, highly sophisticated analytic algorithms can automatically detect or predict problems by instantaneously analyzing unlimited amounts of very complex data without bias, time commitment or excessive false positives.

The Holy Grail here is automating analytics in a way that is reliable, accurate and relevant to different organizational needs, without necessitating any manual intervention. Solutions that offer this type of no-fuss big-data capabilities can reduce or eliminate the detection work required of data scientists, allowing them to focus on results. This in turn reduces costs for the organization and minimizes long-term spending on the ongoing deployment of any big-data solution it has in place.

Ultimately, by solving the issues that prevent full optimization of big data analytics — especially the human factor and its disproportionate impact on the current-day process — organizations will be able to detect and address all types of threats and opportunities much more rapidly. This capability is becoming increasingly crucial in an era when data is being generated by both humans and machines, and is sure to become a pivotal way for businesses to create situational awareness, detect issues and optimize operations to achieve their business objectives.