Why metadata should not live forever

Comment

Nico Sell

Contributor

Nico Sell is the founder of Wickr Foundation, and co-founder and co-chairman at Wickr — a self-destructing, secure, private, anonymous messaging service.

More posts from Nico Sell

The global surge in encrypted traffic and a wide adoption of end-to-end encryption by mainstream tech companies is a transformative shift in information security worth celebrating. Billions of online users now enjoy default peer-to-peer security, shielding the content of web communications from prying eyes of criminals and corporate surveillance.

Yet the industry continues to collect and store massive amounts of metadata associated with every digital transaction — conversations, purchases, data transfers. These extensive historical accounts of personal or business activities live forever, and are shared and analyzed outside of user control, becoming a breeding ground for the next wave of cyber risks at all levels — reputational, financial and national security.

It’s only metadata, nothing to see here

We have been led to believe that metadata — or rather, activity logs — is nothing to worry about; it’s only the content that matters. This may have been true a couple of decades ago when the frequency of digital communications between people and systems was minimal and storage prohibitively expensive. Today, metadata collection and mining has become an industry of its own — accumulating and matching information across countless databases to produce detailed records of everyone’s activities and associations. The goals range from targeting users with relevant advertising to behavioral pattern recognition to aimless harvesting of records for yet unknown future use.

Every technology and service we use — from banking to communications to transport — combined with the massive visual surveillance we encounter daily generate a historically unprecedented amount of information about our whereabouts, mapping out countless connections between people, businesses, locations and things.

In practical terms, the depth and the historic nature of metadata collection would be similar to having someone follow you around 24/7 — online or offline — recording everything you do and who you do it with, only stopping short of listening to your conversations. This is clearly contrary to the dominating public narrative: metadata alone cannot be used to infer specific sensitive details about you.

With the Internet of Things bringing billions of new devices online in the next few years — from cars to smart homes to public utilities and healthcare systems — even more metadata will be fed into the global commercial databases, adding yet another rich and often unprotected layer of information about organizations, individuals and nations.

Today’s corporate data collection, particularly of metadata, is easy and cheap, and it often occurs without meaningful user input and proper informed consent. Most people don’t know where their personal or business activity logs reside and for how long, how they are shared, what conclusions are derived from this data and how it may impact their personal lives or business prospects.

Blurring lines between content and metadata

We kill based on metadata,” an infamous statement by former NSA director Michael Hayden, is a reflection of the intelligence community’s understanding that activity logs have become so exhaustive that they are just as powerful in providing insight into people’s lives and minds as the content of their communications.

A new study by Stanford University found “telephone metadata densely interconnected, susceptible to re-identification, and enabling highly sensitive inferences.” When metadata is used and correlated with other open-source data without any restrictions, it can reveal profoundly intimate information about individuals. And, unlike the content of digital communications, it is not protected under the Fourth Amendment and can be surprisingly trivial to obtain without a warrant.

Our national policy discourse, so intensely focused on the precedence of digital content over metadata, only further exacerbates the imbalance in how private industry — from global corporations to small startups — treats these two types of data. Most activity logs across global databases, as massive as they are, are stored unencrypted without much safeguards to protect data against exposure, nor are they properly secured or anonymized when shared with third parties.

Collecting and storing any information, metadata included, in an unsecure way clearly fails a duty of care companies owe to their users. As a result, the global attack surface is rapidly increasing to expose individuals, organizations and government systems to vulnerabilities, leading to unauthorized collection and use of sensitive data.

Digital toxic waste: Why metadata should not live forever

With no defense being 100 percent impenetrable, the private companies, as predominant data collectors and custodians of information, need to begin thinking long-term about why and how they collect and store our activity logs. When it becomes almost impossible to secure such large data sets, they turn into hazardous waste and a cause for user distrust rather than a source of cash flow.

Think about what you can learn about a person or a company by simply looking through their activity logs across different networks — the answer is likely “too much.” While some data — content or otherwise — may need to be retained for several years for compliance or other reasons, there is a lot more information that does not need to live forever. The less time the metadata lives and the fewer servers it touches, the more secure we all are against targeted criminal attacks and cyber espionage.

As information security becomes a national priority with cyber threats reaching epidemic proportions, both the tech community and policy makers must make it significantly harder and exponentially more expensive to exploit networks and databases containing activity logs.

Here is an easy fix: Limit metadata collection to retain what is essential to your business and only for a short period of time. In addition, anonymize and encrypt the data, while adhering to the responsible information disposal processes.

So long as we keep historically detailed activity logs across services — private or public — without effective means to clear the data that is no longer needed or can be secured, encryption remains a half-measure, giving only a temporary and illusory sense of security.

More TechCrunch

The Series C funding, which brings its total raise to around $95 million, will go toward mass production of the startup’s inaugural products

AI chip startup DEEPX secures $80M Series C at a $529M valuation 

A dust-up between Evolve Bank & Trust, Mercury and Synapse has led TabaPay to abandon its acquisition plans of troubled banking-as-a-service startup Synapse.

Infighting among fintech players has caused TabaPay to ‘pull out’ from buying bankrupt Synapse

The problem is not the media, but the message.

Apple’s ‘Crush’ ad is disgusting

The Twitter for Android client was “a demo app that Google had created and gave to us,” says Particle co-founder and ex-Twitter employee Sara Beykpour.

Google built some of the first social apps for Android, including Twitter and others

WhatsApp is updating its mobile apps for a fresh and more streamlined look, while also introducing a new “darker dark mode,” the company announced on Thursday. The messaging app says…

WhatsApp’s latest update streamlines navigation and adds a ‘darker dark mode’

Plinky lets you solve the problem of saving and organizing links from anywhere with a focus on simplicity and customization.

Plinky is an app for you to collect and organize links easily

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Google I/O 2024: How to watch

For cancer patients, medicines administered in clinical trials can help save or extend lives. But despite thousands of trials in the United States each year, only 3% to 5% of…

Triomics raises $15M Series A to automate cancer clinical trials matching

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Tap, tap.…

Tesla drives Luminar lidar sales and Motional pauses robotaxi plans

The newly announced “Public Content Policy” will now join Reddit’s existing privacy policy and content policy to guide how Reddit’s data is being accessed and used by commercial entities and…

Reddit locks down its public data in new content policy, says use now requires a contract

Eva Ho plans to step away from her position as general partner at Fika Ventures, the Los Angeles-based seed firm she co-founded in 2016. Fika told LPs of Ho’s intention…

Fika Ventures co-founder Eva Ho will step back from the firm after its current fund is deployed

In a post on Werner Vogels’ personal blog, he details Distill, an open-source app he built to transcribe and summarize conference calls.

Amazon’s CTO built a meeting-summarizing app for some reason

Paris-based Mistral AI, a startup working on open source large language models — the building block for generative AI services — has been raising money at a $6 billion valuation,…

Sources: Mistral AI raising at a $6B valuation, SoftBank ‘not in’ but DST is

You can expect plenty of AI, but probably not a lot of hardware.

Google I/O 2024: What to expect

Dating apps and other social friend-finders are being put on notice: Dating app giant Bumble is looking to make more acquisitions.

Bumble says it’s looking to M&A to drive growth

When Class founder Michael Chasen was in college, he and a buddy came up with the idea for Blackboard, an online classroom organizational tool. His original company was acquired for…

Blackboard founder transforms Zoom add-on designed for teachers into business tool

Groww, an Indian investment app, has become one of the first startups from the country to shift its domicile back home.

Groww joins the first wave of Indian startups moving domiciles back home from US

Technology giant Dell notified customers on Thursday that it experienced a data breach involving customers’ names and physical addresses. In an email seen by TechCrunch and shared by several people…

Dell discloses data breach of customers’ physical addresses

Featured Article

Fairgen ‘boosts’ survey results using synthetic data and AI-generated responses

The Israeli startup has raised $5.5M for its platform that uses “statistical AI” to generate synthetic data that it says is as good as the real thing.

14 hours ago
Fairgen ‘boosts’ survey results using synthetic data and AI-generated responses

Hydrow, the at-home rowing machine maker, announced Thursday that it has acquired a majority stake in Speede Fitness, the company behind the AI-enabled strength training machine. The rowing startup also…

Rowing startup Hydrow acquires a majority stake in Speede Fitness as their CEO steps down

Call centers are embracing automation. There’s debate as to whether that’s a good thing, but it’s happening — and quite possibly accelerating. According to research firm TechSci Research, the global…

Retell AI lets companies build ‘voice agents’ to answer phone calls

TikTok is starting to automatically label AI-generated content that was made on other platforms, the company announced on Thursday. With this change, if a creator posts content on TikTok that…

TikTok will automatically label AI-generated content created on platforms like DALL·E 3

India’s mobile payments regulator is likely to extend the deadline for imposing market share caps on the popular UPI (unified payments interface) payments rail by one to two years, sources…

India likely to delay UPI market caps in win for PhonePe-Google Pay duopoly

Line Man Wongnai, an on-demand food delivery service in Thailand, is considering an initial public offering on a Thai exchange or the U.S. in 2025.

Thai food delivery app Line Man Wongnai weighs IPO in Thailand, US in 2025

Ever wonder why conversational AI like ChatGPT says “Sorry, I can’t do that” or some other polite refusal? OpenAI is offering a limited look at the reasoning behind its own…

OpenAI offers a peek behind the curtain of its AI’s secret instructions

The federal government agency responsible for granting patents and trademarks is alerting thousands of filers whose private addresses were exposed following a second data spill in as many years. The…

US Patent and Trademark Office confirms another leak of filers’ address data

As part of an investigation into people involved in the pro-independence movement in Catalonia, the Spanish police obtained information from the encrypted services Wire and Proton, which helped the authorities…

Encrypted services Apple, Proton and Wire helped Spanish police identify activist

Match Group, the company that owns several dating apps, including Tinder and Hinge, released its first-quarter earnings report on Tuesday, which shows that Tinder’s paying user base has decreased for…

Match looks to Hinge as Tinder fails

Private social networking is making a comeback. Gratitude Plus, a startup that aims to shift social media in a more positive direction, is expanding its wellness-focused, personal reflections journal to…

Gratitude Plus makes social networking positive, private and personal

With venture totals slipping year-over-year in key markets like the United States, and concern that venture firms themselves are struggling to raise more capital, founders might be worried. After all,…

Can AI help founders fundraise more quickly and easily?