Why metadata should not live forever

8:00 PM PDT • July 17, 2016

Nico Sell

Contributor

Nico Sell is the founder of Wickr Foundation, and co-founder and co-chairman at Wickr — a self-destructing, secure, private, anonymous messaging service.

It’s only metadata, nothing to see here

We have been led to believe that metadata — or rather, activity logs — is nothing to worry about; it’s only the content that matters. This may have been true a couple of decades ago when the frequency of digital communications between people and systems was minimal and storage prohibitively expensive. Today, metadata collection and mining has become an industry of its own — accumulating and matching information across countless databases to produce detailed records of everyone’s activities and associations. The goals range from targeting users with relevant advertising to behavioral pattern recognition to aimless harvesting of records for yet unknown future use.

Every technology and service we use — from banking to communications to transport — combined with the massive visual surveillance we encounter daily generate a historically unprecedented amount of information about our whereabouts, mapping out countless connections between people, businesses, locations and things.

In practical terms, the depth and the historic nature of metadata collection would be similar to having someone follow you around 24/7 — online or offline — recording everything you do and who you do it with, only stopping short of listening to your conversations. This is clearly contrary to the dominating public narrative: metadata alone cannot be used to infer specific sensitive details about you.

With the Internet of Things bringing billions of new devices online in the next few years — from cars to smart homes to public utilities and healthcare systems — even more metadata will be fed into the global commercial databases, adding yet another rich and often unprotected layer of information about organizations, individuals and nations.

Today’s corporate data collection, particularly of metadata, is easy and cheap, and it often occurs without meaningful user input and proper informed consent. Most people don’t know where their personal or business activity logs reside and for how long, how they are shared, what conclusions are derived from this data and how it may impact their personal lives or business prospects.

Blurring lines between content and metadata

“We kill based on metadata,” an infamous statement by former NSA director Michael Hayden, is a reflection of the intelligence community’s understanding that activity logs have become so exhaustive that they are just as powerful in providing insight into people’s lives and minds as the content of their communications.

A new study by Stanford University found “telephone metadata densely interconnected, susceptible to re-identification, and enabling highly sensitive inferences.” When metadata is used and correlated with other open-source data without any restrictions, it can reveal profoundly intimate information about individuals. And, unlike the content of digital communications, it is not protected under the Fourth Amendment and can be surprisingly trivial to obtain without a warrant.

Our national policy discourse, so intensely focused on the precedence of digital content over metadata, only further exacerbates the imbalance in how private industry — from global corporations to small startups — treats these two types of data. Most activity logs across global databases, as massive as they are, are stored unencrypted without much safeguards to protect data against exposure, nor are they properly secured or anonymized when shared with third parties.

Collecting and storing any information, metadata included, in an unsecure way clearly fails a duty of care companies owe to their users. As a result, the global attack surface is rapidly increasing to expose individuals, organizations and government systems to vulnerabilities, leading to unauthorized collection and use of sensitive data.

Digital toxic waste: Why metadata should not live forever

With no defense being 100 percent impenetrable, the private companies, as predominant data collectors and custodians of information, need to begin thinking long-term about why and how they collect and store our activity logs. When it becomes almost impossible to secure such large data sets, they turn into hazardous waste and a cause for user distrust rather than a source of cash flow.

Think about what you can learn about a person or a company by simply looking through their activity logs across different networks — the answer is likely “too much.” While some data — content or otherwise — may need to be retained for several years for compliance or other reasons, there is a lot more information that does not need to live forever. The less time the metadata lives and the fewer servers it touches, the more secure we all are against targeted criminal attacks and cyber espionage.

As information security becomes a national priority with cyber threats reaching epidemic proportions, both the tech community and policy makers must make it significantly harder and exponentially more expensive to exploit networks and databases containing activity logs.

Here is an easy fix: Limit metadata collection to retain what is essential to your business and only for a short period of time. In addition, anonymize and encrypt the data, while adhering to the responsible information disposal processes.

So long as we keep historically detailed activity logs across services — private or public — without effective means to clear the data that is no longer needed or can be secured, encryption remains a half-measure, giving only a temporary and illusory sense of security.

More TechCrunch

AI chip startup DEEPX secures $80M Series C at a $529M valuation

Kate Park

6 hours ago

The Series C funding, which brings its total raise to around $95 million, will go toward mass production of the startup’s inaugural products

AI chip startup DEEPX secures $80M Series C at a $529M valuation

Startups

Infighting among fintech players has caused TabaPay to ‘pull out’ from buying bankrupt Synapse

Mary Ann Azevedo

8 hours ago

A dust-up between Evolve Bank & Trust, Mercury and Synapse has led TabaPay to abandon its acquisition plans of troubled banking-as-a-service startup Synapse.

Infighting among fintech players has caused TabaPay to ‘pull out’ from buying bankrupt Synapse

Media & Entertainment

Apple’s ‘Crush’ ad is disgusting

Devin Coldewey

8 hours ago

The problem is not the media, but the message.

Apps

Google built some of the first social apps for Android, including Twitter and others

Sarah Perez

9 hours ago

The Twitter for Android client was “a demo app that Google had created and gave to us,” says Particle co-founder and ex-Twitter employee Sara Beykpour.

Apps

WhatsApp’s latest update streamlines navigation and adds a ‘darker dark mode’

Aisha Malik

11 hours ago

WhatsApp is updating its mobile apps for a fresh and more streamlined look, while also introducing a new “darker dark mode,” the company announced on Thursday. The messaging app says…

Apps

Plinky is an app for you to collect and organize links easily

Ivan Mehta

11 hours ago

Plinky lets you solve the problem of saving and organizing links from anywhere with a focus on simplicity and customization.

Plinky is an app for you to collect and organize links easily

Google I/O 2024: How to watch

Brian Heater

12 hours ago

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Venture

Triomics raises $15M Series A to automate cancer clinical trials matching

Marina Temkin

12 hours ago

For cancer patients, medicines administered in clinical trials can help save or extend lives. But despite thousands of trials in the United States each year, only 3% to 5% of…

Triomics raises $15M Series A to automate cancer clinical trials matching

Transportation

Tesla drives Luminar lidar sales and Motional pauses robotaxi plans

Kirsten Korosec

12 hours ago

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Tap, tap.…

Tesla drives Luminar lidar sales and Motional pauses robotaxi plans

Reddit locks down its public data in new content policy, says use now requires a contract

Sarah Perez

12 hours ago

The newly announced “Public Content Policy” will now join Reddit’s existing privacy policy and content policy to guide how Reddit’s data is being accessed and used by commercial entities and…

Venture

Fika Ventures co-founder Eva Ho will step back from the firm after its current fund is deployed

Rebecca Szkutak

12 hours ago

Eva Ho plans to step away from her position as general partner at Fika Ventures, the Los Angeles-based seed firm she co-founded in 2016. Fika told LPs of Ho’s intention…

Fika Ventures co-founder Eva Ho will step back from the firm after its current fund is deployed

Amazon’s CTO built a meeting-summarizing app for some reason

Kyle Wiggers

13 hours ago

In a post on Werner Vogels’ personal blog, he details Distill, an open-source app he built to transcribe and summarize conference calls.

Amazon’s CTO built a meeting-summarizing app for some reason

Sources: Mistral AI raising at a $6B valuation, SoftBank ‘not in’ but DST is

Ingrid Lunden

13 hours ago

Paris-based Mistral AI, a startup working on open source large language models — the building block for generative AI services — has been raising money at a $6 billion valuation,…

Sources: Mistral AI raising at a $6B valuation, SoftBank ‘not in’ but DST is

Enterprise

Google I/O 2024: What to expect

Brian Heater

13 hours ago

You can expect plenty of AI, but probably not a lot of hardware.

Apps

Bumble says it’s looking to M&A to drive growth

Sarah Perez

13 hours ago

Dating apps and other social friend-finders are being put on notice: Dating app giant Bumble is looking to make more acquisitions.

Startups

Blackboard founder transforms Zoom add-on designed for teachers into business tool

Ron Miller

14 hours ago

When Class founder Michael Chasen was in college, he and a buddy came up with the idea for Blackboard, an online classroom organizational tool. His original company was acquired for…

Blackboard founder transforms Zoom add-on designed for teachers into business tool

Startups

Groww joins the first wave of Indian startups moving domiciles back home from US

Manish Singh

14 hours ago

Groww, an Indian investment app, has become one of the first startups from the country to shift its domicile back home.

Groww joins the first wave of Indian startups moving domiciles back home from US

Security

Dell discloses data breach of customers’ physical addresses

Lorenzo Franceschi-Bicchierai

14 hours ago

Technology giant Dell notified customers on Thursday that it experienced a data breach involving customers’ names and physical addresses. In an email seen by TechCrunch and shared by several people…

Dell discloses data breach of customers’ physical addresses

Featured Article

Fairgen ‘boosts’ survey results using synthetic data and AI-generated responses

The Israeli startup has raised $5.5M for its platform that uses “statistical AI” to generate synthetic data that it says is as good as the real thing.

Paul Sawers

14 hours ago

Fairgen ‘boosts’ survey results using synthetic data and AI-generated responses

Hardware

Rowing startup Hydrow acquires a majority stake in Speede Fitness as their CEO steps down

Lauren Forristal

15 hours ago

Hydrow, the at-home rowing machine maker, announced Thursday that it has acquired a majority stake in Speede Fitness, the company behind the AI-enabled strength training machine. The rowing startup also…

Rowing startup Hydrow acquires a majority stake in Speede Fitness as their CEO steps down

Retell AI lets companies build ‘voice agents’ to answer phone calls

Kyle Wiggers

15 hours ago

Call centers are embracing automation. There’s debate as to whether that’s a good thing, but it’s happening — and quite possibly accelerating. According to research firm TechSci Research, the global…

Retell AI lets companies build ‘voice agents’ to answer phone calls

Apps

TikTok will automatically label AI-generated content created on platforms like DALL·E 3

Aisha Malik

17 hours ago

TikTok is starting to automatically label AI-generated content that was made on other platforms, the company announced on Thursday. With this change, if a creator posts content on TikTok that…

TikTok will automatically label AI-generated content created on platforms like DALL·E 3

Fintech

India likely to delay UPI market caps in win for PhonePe-Google Pay duopoly

Manish Singh

19 hours ago

India’s mobile payments regulator is likely to extend the deadline for imposing market share caps on the popular UPI (unified payments interface) payments rail by one to two years, sources…

India likely to delay UPI market caps in win for PhonePe-Google Pay duopoly

Commerce

Thai food delivery app Line Man Wongnai weighs IPO in Thailand, US in 2025

Kate Park

24 hours ago

Line Man Wongnai, an on-demand food delivery service in Thailand, is considering an initial public offering on a Thai exchange or the U.S. in 2025.

Thai food delivery app Line Man Wongnai weighs IPO in Thailand, US in 2025

OpenAI offers a peek behind the curtain of its AI’s secret instructions

Devin Coldewey

1 day ago

Ever wonder why conversational AI like ChatGPT says “Sorry, I can’t do that” or some other polite refusal? OpenAI is offering a limited look at the reasoning behind its own…

OpenAI offers a peek behind the curtain of its AI’s secret instructions

Security

US Patent and Trademark Office confirms another leak of filers’ address data

Zack Whittaker

1 day ago

The federal government agency responsible for granting patents and trademarks is alerting thousands of filers whose private addresses were exposed following a second data spill in as many years. The…

US Patent and Trademark Office confirms another leak of filers’ address data

Security

Encrypted services Apple, Proton and Wire helped Spanish police identify activist

Lorenzo Franceschi-Bicchierai

1 day ago

As part of an investigation into people involved in the pro-independence movement in Catalonia, the Spanish police obtained information from the encrypted services Wire and Proton, which helped the authorities…

Encrypted services Apple, Proton and Wire helped Spanish police identify activist

Apps

Match looks to Hinge as Tinder fails

Lauren Forristal

1 day ago

Match Group, the company that owns several dating apps, including Tinder and Hinge, released its first-quarter earnings report on Tuesday, which shows that Tinder’s paying user base has decreased for…

Apps

Gratitude Plus makes social networking positive, private and personal

Sarah Perez

1 day ago

Private social networking is making a comeback. Gratitude Plus, a startup that aims to shift social media in a more positive direction, is expanding its wellness-focused, personal reflections journal to…

Gratitude Plus makes social networking positive, private and personal

Startups

Can AI help founders fundraise more quickly and easily?

Alex Wilhelm

1 day ago

With venture totals slipping year-over-year in key markets like the United States, and concern that venture firms themselves are struggling to raise more capital, founders might be worried. After all,…

Why metadata should not live forever

Nico Sell

More posts from Nico Sell

It’s only metadata, nothing to see here

Blurring lines between content and metadata

Digital toxic waste: Why metadata should not live forever

More TechCrunch

Tags

AI chip startup DEEPX secures $80M Series C at a $529M valuation

Infighting among fintech players has caused TabaPay to ‘pull out’ from buying bankrupt Synapse

Apple’s ‘Crush’ ad is disgusting

Google built some of the first social apps for Android, including Twitter and others

WhatsApp’s latest update streamlines navigation and adds a ‘darker dark mode’

Plinky is an app for you to collect and organize links easily

Google I/O 2024: How to watch

Triomics raises $15M Series A to automate cancer clinical trials matching

Tesla drives Luminar lidar sales and Motional pauses robotaxi plans

Reddit locks down its public data in new content policy, says use now requires a contract

Fika Ventures co-founder Eva Ho will step back from the firm after its current fund is deployed

Amazon’s CTO built a meeting-summarizing app for some reason

Sources: Mistral AI raising at a $6B valuation, SoftBank ‘not in’ but DST is

Google I/O 2024: What to expect

Bumble says it’s looking to M&A to drive growth

Blackboard founder transforms Zoom add-on designed for teachers into business tool

Groww joins the first wave of Indian startups moving domiciles back home from US

Dell discloses data breach of customers’ physical addresses

Fairgen ‘boosts’ survey results using synthetic data and AI-generated responses

Rowing startup Hydrow acquires a majority stake in Speede Fitness as their CEO steps down

Retell AI lets companies build ‘voice agents’ to answer phone calls

TikTok will automatically label AI-generated content created on platforms like DALL·E 3

India likely to delay UPI market caps in win for PhonePe-Google Pay duopoly

Thai food delivery app Line Man Wongnai weighs IPO in Thailand, US in 2025

OpenAI offers a peek behind the curtain of its AI’s secret instructions

US Patent and Trademark Office confirms another leak of filers’ address data

Encrypted services Apple, Proton and Wire helped Spanish police identify activist

Match looks to Hinge as Tinder fails

Gratitude Plus makes social networking positive, private and personal

Can AI help founders fundraise more quickly and easily?

Why metadata should not live forever

Nico Sell

More posts from Nico Sell

It’s only metadata, nothing to see here

Blurring lines between content and metadata

Digital toxic waste: Why metadata should not live forever

More TechCrunch

Get the industry’s biggest tech news

TechCrunch Daily News

Startups Weekly

TechCrunch Fintech

TechCrunch Mobility

Tags