DeepMind says no quick fix for verifying health data access

Why should you trust an advertising giant with the most sensitive personal data you possess: aka your medical records? That’s the hugely sticky issue Google-owned DeepMind is facing as it seeks to embed itself into the U.K.’s healthcare space — a big push publicly announced in February last year.

DeepMind is now fleshing out in a little more detail how it hopes to futureproof patient trust in commercial access to and monetization of their health data, via a blog post that puts a little more meat on the bones of a plan for a technical audit infrastructure first discussed last November — when DeepMind also confirmed it was building an access infrastructure for National Health Service (NHS) patient medical records. And the world-famous AI company is not sounding hugely confident of being able to build a verifiable audit system for health data — move over Alpha Go! There’s a new challenge for DeepMind to apply its collective wits to.

The end-game of DeepMind’s NHS access infrastructure plan is for the company to have ownership of a standard interface that could be rolled out to other NHS Trusts, enabling both DeepMind and third-party developers to more easily deliver apps into the U.K.’s healthcare system (but with DeepMind positioned to be able to charge other app-makers for access to the access API it’s building, for example).

On its own account, it has said that where AI and health intersect its future ambition is to be able to charge by results. But in the meantime, it needs patient data to train its AIs. And it’s that scramble for data that got DeepMind into early hot water last year.

Since 2015, the company has inked multiple agreements with U.K. NHS Trusts to gain access to patient data for various purposes, some but not all for AI research. The most wide-ranging of DeepMind’s NHS data-sharing arrangements to date, with the Royal Free NHS Trust — to build an app wrapper for an NHS algorithm to identify acute kidney injury — caused major controversy when an FOI request revealed the scope of identifiable patient data the company was receiving. DeepMind and the Trust in question had not publicly detailed how much data was being shared.

Patient consent in that instance is assumed (meaning patients are not asked to consent), based on an interpretation of NHS healthcare data-sharing guidelines for so-called “direct patient care” that has been questioned by data protection experts and criticized by health data privacy advocacy group MedConfidential.

The original DeepMind-Royal Free data-sharing arrangement (it’s since been re-inked) also remains under investigation by the U.K.’s national data protection agency, the ICO. And under review by the National Data Guardian, the government-appointee tasked with ensuring citizens’ health data is safeguarded and used properly.

Despite the ongoing probes, the app DeepMind built with London’s Royal Free NHS Trust has been deployed in the latter’s three hospitals. So you could say the AI-company thinking about a health data access audit infrastructure at this point in proceedings is akin to a coach-driver talking about putting an as yet unconstructed cart on a horse that’s already been released to run around the fields — while simultaneously asking those being saddled up to trust it. (See also: DeepMind wasting no time PRing the apparent benefits of the Streams app created after it gained liberal access to Royal Free patients’ medical records.)

The overarching issue here is trust — trust that the sensitive healthcare data of patients is not being shared without the proper authorizations, and/or with patient consent. And that patients are not left in the dark about who is being allowed to access their personal information and for what purposes.

DeepMind’s answer to the trust issue — and the controversy caused by how it went about acquiring NHS patient data in the first place — appears primarily to be a technical one. Though building an audit infrastructure after you’ve already gained access to data does not satisfy legal or privacy experts. And such a topsy-turvy trust trajectory may be unlikely to impress patients either. (Albeit DeepMind has also started engaging with patient groups, even if only after the controversy arose.)

In a blog post entitled “Trust, confidence and Verifiable Data Audit,” DeepMind paints a picture of a technical audit infrastructure that uses “mathematical assurance” and open source respectability to deliver “verifiable” data access audits that — it presumably hopes — will euthanize the trust issue, down the road. In a nearer time frame, its hope looks to be trying to kick the can of scrutiny far away from the “trust us” reality of how it is currently utilizing patient data (i.e. without a verifiable technical infrastructure to prove its claims, and while still under review by U.K. data protection bodies).

The Google-owned AI company writes:

Imagine a service that could give mathematical assurance about what is happening with each individual piece of personal data, without possibility of falsification or omission. Imagine the ability for the inner workings of that system to be checked in real-time, to ensure that data is only being used as it should be. Imagine that the infrastructure powering this was freely available as open source, so any organisation in the world could implement their own version if they wanted to.

Of course, that’s just introductory mood music. The meat of the post contains few concrete assurances, beyond a repeatedly stated conviction of how tough it will be for DeepMind to build a “Verifiable Data Audit for DeepMind Health,” as it describes the planned audit infrastructure.

This is “really hard, and the toughest challenges are by no means the technical ones” it writes — presumably an oblique reference to the fact that it needs to get buy-in from all the various healthcare and regulatory stakeholders. Ergo, it needs to gain their trust in its approach. (Which in turn explains the mood music, and the tight PR game.)

Timing and viability for the technical audit infrastructure also remain vague. So while, as noted above, the DeepMind-built Streams app is again in use in three London hospitals, its slated trust-building audit system has not yet even begun to be constructed.

And with the blog post replete with warnings about the challenges/difficulty of building the hoped-for infrastructure, the subtext sounds a lot like: “NB, this may actually not be possible.”

“Over the course of this year we’ll be starting to build out Verifiable Data Audit for DeepMind Health,” it writes early on. But by the end of the post it’s talking about “hoping to be able to implement the first pieces of this later this year” — so it’s shifted from “starting” to “hoping” within the course of the same blog post.

We’ve reached out to DeepMind to ask for clarity on its timeline for building the audit infrastructure and will update this post with any response.

In terms of additional details of how the audit infrastructure might work, DeepMind says the aim is to build on the existing data logs it creates when its systems interact with health data via an append-only “special digital ledger” — not a decentralized blockchain (which it claims would be wasteful in terms of resources) but by a DeepMind-controlled ledger that has a tree-like structure, meaning new entries generate a cryptographic hash that summarizes both the latest entry and all of the previous values — with the idea being to make entries tamper-proof, as the ledger grows. It says an entry would record: “the fact that a particular piece of data has been used, and also the reason why — for example, that blood test data was checked against the NHS national algorithm to detect possible acute kidney injury.”

Notably it does not specify whether ledger entries will log when patient data is being used to train any AI models — something DeepMind and the Royal Free have previously said they aim to do — nor whether an audit trail will be created of how patient data changes AI models, i.e. to enable data inputs to be compared with patient outcomes and allow for some algorithmic accountability in the future. (On that topic DeepMind, probably the world’s most famous AI company, remains markedly silent.)

“We’ll build a dedicated online interface that authorised staff at our partner hospitals can use to examine the audit trail of DeepMind Health’s data use in real-time,” it writes instead. “It will allow continuous verification that our systems are working as they should, and enable our partners to easily query the ledger to check for particular types of data use. We’d also like to enable our partners to run automated queries, effectively setting alarms that would be triggered if anything unusual took place. And, in time, we could even give our partners the option of allowing others to check our data processing, such as individual patients or patient groups.”

Looping patients into audits might sound nice and inclusive, but DeepMind goes on to caveat the difficulties of actually providing any access for patient groups/individual patients as one of the major technical challenges standing in the way of building the system — so again this is best filed under “mood music” at this nascent point.

Discussing the “big technical challenges” — as it sees it — the first problem DeepMind flags is being able to ensure that all access to data is logged by the ledger. Because, obviously, if the system fails to capture any data interactions the entire audit falls apart. So really that’s not so much a “challenge” as a massive question mark about the feasibility of the entire endeavor.

Yet on this DeepMind merely rather tentatively writes (emphasis mine):

As well as designing the logs to record the time, nature and purpose of any interaction with data, we’d also like to be able to prove that there’s no other software secretly interacting with data in the background. As well as logging every single data interaction in our ledger, we will also need to use formal methods as well as code and data centre audits by experts, to prove that every data access by every piece of software in the data centre is captured by these logs. We’re also interested in efforts to guarantee the trustworthiness of the hardware on which these systems run – an active topic of computer science research!

Frankly, I’d argue that there being zero chance of “secret software” getting clandestine access to people’s sensitive medical records would have to be a requirement, not an optional extra, for the proposed audit system to have a shred of credibility.

Notably, DeepMind also does not specify whether or not the “experts” it here envisages being needed to audit its data centers/infrastructure would be independent of the company itself. But obviously they would need to be — or again any audits the system delivers would not be worth the paper they’re written on.

We’ve reached out to DeepMind with questions about its intentions vis-à-vis open sourcing the technical audit infrastructure, and again will update this post with any response.

As with end-to-end encryption protocols, for example, it’s clear that for any technical audit solution to be credible DeepMind would need to open it up entirely — publishing detailed whitepapers and fully open sourcing all components, as well as having expert outsiders perform a thorough audit of its operation (likely on an ongoing basis, as the infrastructure gets updated/upgraded over time).

Nothing short of a full open sourcing would be required. Remember: This is a data processor itself proposing to build an audit system for the health data it is being granted access to by the data controller. So the conflict is very clear.

DeepMind does not make that point, of course; rather it concludes its blog with a vague hope of getting help in realizing its vision from any generally interested others. “We hope that by sharing our process and documenting our pitfalls openly, we’ll be able to partner with and get feedback from as many people as possible, and increase the chances of this kind of infrastructure being used more widely one day, within healthcare and maybe even beyond.”

But if the company really wants to inculcate trust in its vision for overhauling healthcare delivery, it will need to make itself and its processes a lot more transparent and accountable than they have been thus far.

For example, it could start by answering questions such as what is the legal basis for DeepMind processing the sensitive data of healthy patients who will never go on to develop AKI?

And why it and the Royal Free did not pursue a digital integration solution for the Streams app that only pulls in a sub-set of data on patients who might be vulnerable, rather than the much broader swathe of medical records that are passed to DeepMind under the data-sharing arrangement?

Asked for comment on DeepMind’s audit infrastructure plans, Phil Booth, coordinator of MedConfidential, raised just such unanswered questions regarding the original data-sharing arrangement — pointing out that the ongoing issue is how and why the company got so much patient identifiable data in the first place, rather than quibbles over how data access might be managed after the fact.

Discussing the proposed audit infrastructure, Booth said: “In the case of Google’s dodgy deal with the Royal Free, this will eventually demonstrate to a patient that data was copied to Google ‘for direct care’ when they were nowhere near the hospital at the time. It should irrevocably log that Google got data that they were not entitled to access, and now refuse to answer questions about.”

“It’s like a black box recorder for a flight,” he added, of the audit infrastructure. “You always hope it’s not necessary, but if something goes wrong, it’s reassuring to know someone can figure out what happened after your plane flew into a mountain.”

Update: DeepMind has confirmed that it intends to fully open source all elements of the audit infrastructure, and that the “experts” required to audit the code and data centers pertaining to the audit infrastructure will be external to the company itself.

It also confirms that the audit ledger will log when patient data is used to train AI models — though it’s less clear whether the ledger will also chronicle how data changes DeepMind’s AI models, i.e. to support future algorithmic accountability by explaining how healthcare decisions were arrived at by AI-powered apps/services.

In a ‘best case scenario’, it says it’s aiming to have the audit system up and running this year.