Toward transitive data privacy and securing the data you don’t share

We are spending a lot of time discussing what happens to data when you explicitly or implicitly share it. But what about data that you have never ever shared?

Your cousin’s DNA

We all share DNA  —  after all, it seems we are all descendants of a few tribes. But the more closely related you are, the closer the DNA match. While we all know we share 50 percent DNA with siblings, and 25 percent with first cousins  —  there is still some meaningful match even between distant relatives (depending on the family tree distance).

In short, if you have never taken a DNA test but one or more of your blood relatives has, and shared that data  —  some of your DNA is effectively now available for a match.

While this may have seemed like theory a few weeks ago, the cops caught the Golden State Killer by using this method.

Cambridge Analytica

A similar thing happened when data was mis-used by Cambridge Analytica. Even if you never used the quiz app on the Facebook platform but your friends did, they essentially revealed private information about you without your consent or knowledge.

The number of users that took the quiz was shockingly small  —  only 300,000 users participated. And yet, upwards of 50 million (as many as 87 million) people eventually had their data collected by Cambridge Analytica.

And all of this was done legally and while complying with the platform requirements at that time.

Transitive data privacy

The word transitive simply means if A is related to B in a certain way, and B to C  — then A is related to C. For example, cousins is a transitive property. If Alice and Bob are cousins, and Bob and Chamath are cousins, then Alice and Chamath are cousins.

As private citizens, and corporations, we now have to think about transitive data privacy loss.

The simplest version of this is if your boyfriend or girlfriend forwards your private photo or conversation screenshot to someone else.

Transitive sharing upside

While we have discussed a couple of clear negative examples, there are many ways transitive data relationships help us.

Every time you ask a friend to connect you to someone on LinkedIn for a job or fundraise, you are leveraging the transitive relationship graph.

The DNA databases being created are primarily for social good  —  to help us connect with our roots and family, detect disease early and help medical research.

In fact, you could argue that a lot of challenges we face today require more data sharing, not less. If your hospital cannot share data with your primary care doctor at the right time, or your clinical trial data cannot be accessed to monitor downstream effects, we cannot take care of our citizens’ health as we should. Organizations like NIH and the VA and CMS (Medicare) are working hard to encourage appropriate easier sharing by healthcare providers.

Further, the good news is that there have been significant advances in security in encryption and hashing that enable companies to protect against the unintended side effects. More research is definitely called for. We can anonymize data, we can perturb data, and apply these techniques for protection while still being able to derive value and help customers.