Data is the world’s most valuable (and vulnerable) resource

There's a gaping hole in the data loss prevention market

There’s no overstating it: 2020 was a hell of a year. When future generations learn about 2020, the pandemic, social tension and political unrest will take up most of the oxygen. But for those learning about the history of cybersecurity, 2020 and a midsize company from Austin, Texas — SolarWinds — will take center stage.

Malicious code in one update of a trusted software provider was the Trojan horse that enabled access to petabytes of private data across 18,000 organizations, including Fortune 500s and government entities.

Every business leader must acknowledge what many in cybersecurity have been saying — cyber strategy is company strategy.

Why will SolarWinds be so generationally important, and why am I talking about it? Because the large (and growing) impact of the hack and the substantial (and mounting) losses mean that every business leader must acknowledge what many in cybersecurity have been saying — cyber strategy is company strategy. It is not an audit, but an important part of C-suite strategies and best practices ranging from employee onboarding to mundane everyday coding.

I believe generational startups will be created from this reckoning with cybersecurity, just as they’ve been created coming out of market disruptions in the past. I’ve been thinking about this for a while, but it is more clear than ever that we will see cyber go on a tear this next decade.

Forecasts suggest $100 billion of new market value by 2025 alone, putting total market size at close to $280 billion, but I think this figure is conservative. Cyber is — and will be — a massive business.

One key driver of growth in the cyber market is really easy to understand, but really hard to solve for: data. Cyber is often a second-order value proposition, after speed of development, managing IT assets or data. We’re familiar with the idea that “data is the new oil.” Since that phrase was coined by mathematician Clive Humby 15 years ago, the total amount of data in the world has increased 74x.

By 2025, IDC forecasts the data universe will consist of 175 zettabytes. In case you don’t know, one zettabyte is 1 trillion gigabytes. If you were to download 175 zettabytes of data on your computer, it would take you 1.8 billion years. Mind-boggling!

And it only increases exponentially from here. From likes, posts, profile views, follows and RTs for end consumers to time on site, conversion rate and bounce rate for websites to events, errors and anomaly tracking in IoT — all of this data is logged and tracked. We’ve seen billion-dollar companies built, taken public and acquired that ingest and visualize all of the data we capture.

The next generation of API startups is valuable proportionally to their ability to “talk” with apps in the ecosystem by sharing and ingesting data.

Image Credits: Upfront Ventures

Of course — data makes us smarter. But with this proliferation, we’re seeing the downside of all of this data, or what Nick Halstead, founder of InfoSum (and Upfront Portfolio Company) observed about “data as oil”: “It’s sticky and gets all over the place.” As I’ve written before, improperly storing data isn’t new. What is new(ish) is how security is made harder by the immense quantity of data that exists and all of the different places you can put it.

This is made harder still by over 40% of the global workforce working from home on insecure networks, on devices that are part of “bring your own device” programs, and creating, accessing and storing data in multicloud environments. Gone are the days of fencing the perimeter and securing devices used to access the network, and database administrators that act as the data protectors and gatekeepers.

If you don’t know what data you have, let alone know where it is, you can’t protect it. Even if you have tools alerting you to vulnerabilities, you can’t trust that these alerts really are your top priority if your tools don’t have a complete view of the universe of data.

Governments are getting involved to help enforce better behavior. The GDPR in Europe in 2018 and CCPA in California in 2020 are the first of what’s coming to the rest of the U.S. and the world. While each privacy act will have nuances, the general purpose is the same: Give consumers greater ability to opt in/out of what is shared and captured by companies with which they interact, and fine organizations that don’t comply.

All of these factors cause every organization to ask some questions: Where is all of our data actually? How do I make sure it’s secure? How long should we hold on to it? Do we really need all of the data we have? At what point should we delete it?

Legacy tools assess compliance and security periodically, like a financial audit, but only for data in known locations (it is, after all, very challenging to find something you don’t know you are looking for) and are typically set up for structured versus unstructured data (data sitting in lakes). Consequently, here are some quotes we have heard in the industry about current data loss prevention (DLP) tools:

  • “DLP is the biggest unsolved problem in security.”
  • “Nothing out there does data system discovery.”
  • “Data discovery … I get asked about it … there’s nothing.”

there is a gaping hole in the market

Image Credits: Upfront Ventures

The massive amount of data that enterprises sit on today requires a new approach. This is not lost on founders and investors. It’s getting a lot of attention in startup land. Wiz, a cybersecurity company out of Israel, raised a $100 million Series A from Index, Sequoia, Insight and others in December 2020. Its platform provides a visual representation of your cloud deployment across cloud service providers and levels of the tech stack (e.g., infrastructure, platform, containers, workloads) and generates a risk-weighted view of vulnerabilities.

Open Raven (an Upfront portfolio company) raised a $15 million Series A led by Kleiner Perkins in June 2020. It is building a data-wrangling solution with the belief that no organization actually knows where all of its data exists. First, they inventory all of your data and then classify it to determine what data you actually have and what is high risk.

They believe you can’t afford to care about each and every one of the hundreds of millions of objects you store, just as you can’t care about the hundreds of alerts you get in security automation centers.

You have to winnow it down using heuristics and rules to isolate what you really care about. They also set up companies to scale across different cloud environments cost-effectively for both structured and unstructured data. Both of these companies are coming at the data visibility challenge from different angles, and there are a lot of other solutions needed in this space.

Just as many of us will use the pandemic as a reference point in time, I believe we will come to view cybersecurity before SolarWinds (BSW) and after SolarWinds (ASW). Cybersecurity has been consuming more and more of my passion and interest because it encompasses and impacts so much business strategy.

It is often hidden in what we view as device management companies or data companies, but it is becoming front and center as key to unlocking the power of the great migration to the cloud. I’m betting that the ASW period will be the most fertile in the history of the cybersecurity market — and I’m excited to be part of it.