Ever felt like you were being watched online? You know, like when you read something about New York, and the next site you visit shows you ads for New York hotels? As it turns out, on my computer, there were more than 130 companies tracking my every move (check yours here, then install this plug-in).
These companies are basically engaging in mass surveillance. Just as governments justify tracking us to prevent terrorist attacks, these companies are tracking us online, without our consent, because a marginal 0.7 percent of the population clicks on their ads.
And it’s not just online advertisers. From e-commerce websites to physical retail stores, everyone is now racing to capture more data about us. Don’t be surprised if your insurance company starts charging you more because of how it thinks you should live your life!
Don’t get me wrong, I use Facebook, Google and all those other services. I use them because I find them useful, fun or because I don’t have an alternative. But I do it knowing very well that I am partly giving away my right to privacy.
The nature of the game in the past years has been that a free service has the right to collect your personal data and sell it to other companies, such as advertisers. This data party (to which you were not invited) will soon be over though, because 93 percent of Internet users now want to control who can access their data.
It turns out that you could have the exact same service, with the exact same quality, while not giving away your personal data. This is called “Privacy by Design,” and is about adding privacy safeguards directly at the product design, algorithm and business model level, so no one can abuse the users’ data.
Not only is this the right thing to do ethically, it is also the right thing to do as a business if you want to exist in the long run.
A recent study has shown that more than half of Internet users are concerned with privacy, but feel it’s already too late. In another study, U.S. citizens ranked “corporate tracking of personal data” as the third biggest fear they have, just behind government corruption and cyberterrorism. It is even the second reason why people aren’t purchasing new connected objects. All these surveys are pointing at the same conclusion: Privacy is now a mainstream concern.
Privacy is a fundamental human right
It might have gotten in the spotlight recently, but privacy is not a new idea. It has been around at least since ancient Greece, when Hippocrates instated the medical secret. It goes: “I will respect the privacy of my patients, for their problems are not disclosed to me that the world may know.”
Privacy is a fundamental cornerstone of our society.
It really became a big deal in Europe, though, after the Second World War. Back then, the governments would compile files with specific personal information, such as name, address, political party, ethnicity or religion. When the Nazi regime occupied France, they used these centralized files to target Jews, first to arrest them, and eventually to murder them.
The scar this left was so profound that the right to privacy got a dedicated article in the Universal Declaration of Human Rights. Article 12 states:
No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honor and reputation. Everyone has the right to the protection of the law against such interference or attacks.
This was followed by a number of new laws, such as the 1951 French Law on Privacy, called “Secret Statistique” (i.e. “Statistical Secret”). In a nutshell, the law made it illegal for the government and corporations to collect and store sensitive personal data without user consent. It did allow, though, for the collection of aggregated data, as long as individuals could not be re-identified by cross-referencing these aggregated datasets. This is still how we legally define “anonymous data.”
My point here is that privacy is not a “nice to have” or a fancy marketing argument. Rather, it’s a fundamental cornerstone of our society, without which we are putting ourselves at great existential risk.
“Arguing that you don’t care about the right to privacy because you have nothing to hide is no different than saying you don’t care about free speech because you have nothing to say.” — Edward Snowden
Fortunately, companies like Apple are doing the right thing by taking a stand and protecting their customers’ privacy. Asking for a backdoor will only make it easier for attackers to gain access, weakening our cybersecurity even further and essentially dismissing decades of security best-practices. And trying to forbid crypto altogether is plain stupid, since everything is already in the public domain (you even have open source encrypted messaging apps). It’s about time we just accept that everything is going dark, and concentrate on finding solutions to fight crime that way!
Backdoors are not just an issue for surveillance, they also enable hackers to steal the data that a company has about you. This past decade alone, nearly two billion records have been stolen, with 600 million in 2014 alone.
Some of those hacks had terrible consequences. For example, when Ashley Madison got hacked, more than 32 million people were exposed for adultery, resulting in countless divorces and even suicides.
“What all these data breaches are teaching us is that data is a toxic asset and saving it is dangerous.” — Bruce Schneier
The more sensitive data you store, the more you become a target for hackers and surveillance. Given that everybody eventually gets hacked, the only safe thing to do is to not have the data in the first place.
How you can build privacy by design
Given what was just stated above, you might be tempted to think we shouldn’t use any of these services in the first place. But through good engineering, design and technology, we could have all of them without any risk to our privacy.
At Snips, we believe that privacy is an integral part of building an AI product. If you are collecting sensitive user data, it is your duty to protect your users from potential abuses.
Whether you are building a bot, AI assistant or smart IoT, you should therefore ask yourself three questions:
- What are you protecting against?
- Are you leaking information indirectly?
- What trade-offs do you have to make for the product to exist?
For example, if you want to protect against corporate abuse, your systems should be built such that no one can take advantage of your users’ data, whether it’s you internally, hackers, governments or a new evil CEO.
Assessing information leakage is much harder, since not having data on users doesn’t prevent insights from being extracted through metadata. Even if you don’t store this data, a corrupted employee could still add an internal proxy that captures the traffic and stores it somewhere else.
A good example of a common trade-off is privacy versus anonymity.
Furthermore, to offer full privacy, all third-party services involved need to be private, as well. Because this is not usually the case, you need to accept trade-offs, and do the best you can in the short term to still exist in the long term. The idea is that by building a huge company with privacy as a core value, you will lead by example and force the market to move in the same direction.
A good example of a common trade-off is privacy versus anonymity. Privacy is about hiding what you are doing, while anonymity is about hiding your identity. Ideally you want both, but in practice this is not always possible.
Here are some things you can already do.
The most straightforward thing you can do is process as much of the data directly on the device that produced it. For example, on a smartphone, you can process the geolocation traces locally, so that it is never sent to your servers. Everything from cleaning up trajectories to inferring transportation modes and places visited can be done on-device. Even machine learning and natural language processing can now be done that way.
There are, of course, major challenges in doing this, from hardware limitations (CPU, RAM, battery… ) to software limitations (the OS can kill you anytime, right in the middle of a computation). This means the algorithms need to be adapted to be lightweight, fault-tolerant and fast.
There is a cool side-effect of computing on-device though: it greatly reduces infrastructure and network costs, because less data is sent and processed in the cloud! And it turns out to also be a major advantage for IoT.
Some features, though, cannot run on a single device. Examples include social features, cross-device handoff, accessing huge databases or performing heavy computations. In these cases, you can use modern cryptography techniques to guarantee some level of privacy.
For example, you can use Private Information Retrieval techniques to privately query a database. The device can send a request to a server, but without the latter knowing what it is being asked. The server then returns a result that only the device making the query can understand.
A general pattern to apply here is to never centralize all the user data in one place.
Another type of crypto that gets me really excited these days is called homomorphic encryption. In a nutshell, you can compute directly on encrypted data, meaning the servers never sees what it’s manipulating! The device encrypts the data and sends it to the server, which then runs some algorithms on it and returns a result it cannot understand. Only the user device can then decrypt it.
With fully homomorphic encryption, you could do machine learning on encrypted data, or distribute computing on devices that are inherently insecure.
Although that’s still a few years away, you can already do practical things, like enabling users to know which of their friends are nearby without sending their actual location, or recommending a place that multiple users would like without sending the history of where they have been.
Decentralization of services
On-device computing and cryptography are great ways to prevent your company from accessing users’ personal data. But you will most likely still need to use third-party providers for some parts of your product.
A general pattern to apply here is to never centralize all the user data in one place. Rather, try to use as many different providers as possible, so that each of them only has a tiny piece of the puzzle. What you want to avoid is creating a single point of failure by putting all the data in the same place, therefore making that place the only thing that needs to be hacked to get access to someone’s entire life.
This is important because hacking into multiple systems to retrieve all the data is much harder than hacking into just one (don’t forget every system eventually gets hacked). So by using three or more independent providers, you are making it several orders of magnitude harder to steal all the data!
For the things where you really have no choice and need to collect user data, such as app analytics, just do it via opt-in. This involves a lot of transparency on what you do with this data, which you should explain simply and clearly, with an easy way to opt-out later.
The whole idea behind privacy by design is to provide protection now and in the future, regardless of governance, corruption and security breaches. When done right, privacy can vastly reduce the impact of attacks on your business and reputation, since there would be no sensitive data to leak.
Ideally though, we shouldn’t care about privacy. Not because it’s unimportant, but rather because it would be by default in everything, offering an ethical baseline that makes us feel safe. We shouldn’t have to worry about our privacy, just as we shouldn’t have to worry about war, discrimination, hunger, disease or money.
If you are a CEO, you have two choices: be in denial, ignore privacy and risk your company disappearing if the market turns; or, be a forward-thinking leader who embraces it as a strategic advantage, thereby building a future-proof organization that is both ethical and beneficial to society.