GDPR panic may spur data and AI innovation

If AI innovation runs on data, the new European Union’s General Data Protection Regulations (GDPR) seem poised to freeze AI advancement. The regulations prescribe a utopian data future where consumers can refuse companies access to their personally identifiable information (PII). Although the enforcement deadline has passed, the technical infrastructure and manpower needed to meet these requirements still do not exist in most companies today.

Coincidentally, the barriers to GDPR compliance are also bottlenecks of widespread AI adoption. Despite the hype, enterprise AI is still nascent: Companies may own petabytes of data that can be used for AI, but fully digitizing that data, knowing what the data tables actually contain and understanding who, where and how to access that data remains a herculean coordination effort for even the most empowered internal champion. It’s no wonder that many scrappy AI startups find themselves bogged down by customer data cleanup and custom integrations.

As multinationals and Big Tech overhaul their data management processes and tech stack to comply with GDPR, here’s how AI and data innovation counterintuitively also stand to benefit.

How GDPR impacts AI

GDPR covers the collection, processing and movement of data that can be used to identify a person, such as a name, email address, bank account information, social media posts, health information and more, all of which are currently used to power the AI algorithms ranging from targeting ads to identifying terrorist cells.

The penalty for noncompliance is 4 percent of global revenue, or €20 million, whichever is higher. To put that in perspective: 4 percent of Amazon’s 2017 revenue is $7.2 billion, Google’s is $4.4 billion and Facebook’s is $1.6 billion. These regulations apply to any citizen of the EU, no matter their current residence, as well as vendors upstream and downstream of the companies that collect PII.

Article 22 of the GDPR, titled “Automated Individual Decision-making, including Profiling,” prescribes that AI cannot be used as the sole decision-maker in choices that have legal or similarly significant effects on users. In practice, this means an AI model cannot be the only step for deciding whether a borrower can receive a loan; the customer must be able to request that a human review the application.

One way to avoid the cost of compliance, which includes hiring a data protection officer and building access controls, is to stop collecting data on EU residents altogether. This would bring PII-dependent AI innovation in the EU to a grinding halt. With the EU representing about 16 percent of global GDP, 11 percent of global online advertising spend and 9 percent of the global population in 2017, however, Big Tech will more likely invest heavily in solutions that will allow them to continue operating in this market.

Transparency mandates force better data accessibility

GDPR mandates that companies collecting consumer data must enable individuals to know what data is being collected about them, understand how it is being used, revoke permission to use specific data, correct or update data and obtain proof that the data has been erased if the customer requests it. To meet these potential requests, companies must shift from indiscriminately collecting data in a piecemeal and decentralized manner to establishing an organized process with a clear chain of control.

Any data that companies collect must be immediately classified as either PII or de-identified and assigned the correct level of protection. Its location in the company’s databases must be traceable with an auditable trail: GDPR mandates that organizations handling PII must be able to find all copies of regulated data, regardless of how and where it is stored. These organizations will need to assign someone to manage their data infrastructure and fulfill these user privacy requests.

Unproven upside alone has always been insufficient to motivate cross-functional modernization.

Having these data infrastructure and management processes in place will greatly lower the company’s barriers to deploying AI. By fully understanding their data assets, the company can plan strategically about where they can deploy AI in the near-term using their existing data assets. Moreover, once they build an AI road map, the company can determine where they need to obtain additional data to build more complex and valuable AI algorithms. With the data streams simplified, storage mapped out and a chain of ownership established, the company can more effectively engage with AI vendors to deploy their solutions enterprise-wide.

More importantly, GDPR will force many companies dragging their feet on digitization to finally bite the bullet. The mandates require that data be portable: Companies must provide a way for users to download all of the data collected about them in a standard format. Currently, only 10 percent of all data is collected in a format for easing analysis and sharing, and more than 80 percent of enterprise data today is unstructured, according to Gartner estimates.

Much of this structuring and information extraction will initially have to be done manually, but Big Tech companies and many startups are developing tools to accelerate this process. According to PWC, the sectors most behind on digitization are healthcare, government and hospitality, all of which handle large amounts of unstructured data containing PII — we could expect to see a flood of AI innovation in these categories as the data become easier to access and use.

Consumer opt-outs require more granular AI model management

Under GDPR guidelines, companies must let users prevent the company from storing certain information about them. If the user requests that the company permanently and completely delete all the data about them, the company must comply and show proof of deletion. How this mandate might apply to an AI algorithm trained on data that a user wants to delete is not specifically prescribed and awaits its first test case.

Today, data is pooled together to train an AI algorithm. It is unclear how an AI engineer would attribute the impact of a particular data point to the overall performance of the algorithm. If the enforcers of GDPR decide that the company must erase the effect of a unit of data on the AI model in addition to deleting the data, companies using AI must find ways to granularly explain how a model works and fine tune the model to “forget” that data in question. Many AI models are black boxes today, and leading AI researchers are working to enable model explainability and tunability. The GDPR deletion mandate could accelerate progress in these areas.

In this post-GDPR future, companies no longer have to infer intent from expensive schemes to sneakily capture customer information.

In the nearer term, these GDPR mandates could shape best practices for UX and AI model design. Today, GDPR-compliant companies offer users the binary choice of allowing full, effectively unrestricted use of their data or no access at all. In the future, product designers may want to build more granular data access permissions.

For example, before choosing to delete Facebook altogether, a user can refuse companies access to specific sets of information, such as their network of friends or their location data. AI engineers anticipating the need to trace the effect of specific data on a model may choose to build a series of simple models optimizing on single dimensions, instead of one monolithic and very complex model. This approach may have performance trade-offs, but would make model management more tractable.

Building trust for more data tomorrow

The new regulations require companies to protect PII with a level of security previously limited to patient health and consumer finance data. Nearly half of all companies recently surveyed by Experian about GDPR are adopting technology to detect and report data breaches as soon as they occur. As companies adopt more sophisticated data infrastructure, they will be able to determine who has and should have access to each data stream and manage permissions accordingly. Moreover, the company may also choose to build tools that immediately notify users if their information was accessed by an unauthorized party; Facebook offers a similar service to its employees, called a “Sauron alert.”

Although the restrictions may appear to reduce tech companies’ ability to access data in the short-term, 61 percent of companies see additional benefits of GDPR-readiness beyond penalty avoidance, according to a recent Deloitte report. Taking these precautions to earn customer trust may eventually lower the cost of acquiring high-quality, highly dimensional data.

In this post-GDPR future, companies no longer have to infer intent from expensive schemes to sneakily capture customer information. Improved data infrastructure will have enabled early AI applications to demonstrate their value, encouraging more customers to voluntarily share even more information about themselves to trustworthy companies.

Unproven upside alone has always been insufficient to motivate cross-functional modernization, but the threat of a multi-billion-dollar penalty may finally spur these companies to action. More importantly, GDPR is but the first of much more data privacy regulation to come, and many countries across the world look to it as a model for their own upcoming policies. As companies worldwide lay the groundwork for compliance and transparency, they’re also paving the way to an even more vibrant AI future to come.