AI report fed by DeepMind, Amazon, Uber urges greater access to public sector data sets

What are tech titans Google, Amazon and Uber agitating for to further the march of machine learning technology and ultimately inject more fuel in the engines of their own dominant platforms? Unsurprisingly, they’re after access to data. Lots and lots of data.

Specifically, they’re pushing for free and liberal access to publicly funded data — urging that this type of data continue to be “open by default,” and structured in a way that supports “wider use of research data.” After all, why pay to acquire data when there are vast troves of publicly funded information ripe to be squeezed for fresh economic gain?

Other items on this machine learning advancement wish-list include new open standards for data (including metadata); research study design that has the “broadest consents that are ethically possible,” and a stated desire to rethink the notion of “consent” as a core plank of good data governance — to grease the pipe in favor of data access and make data holdings “fit for purpose” in the AI age.

These suggestions come in a 125-page report published today by the Royal Society, aka the U.K.’s national academy of science, ostensibly aimed at fostering an environment where machine learning technology can flourish in order to unlock mooted productivity gains and economic benefits — albeit the question of who, ultimately, benefits as more and more data gets squeezed to give up its precious insights is the overarching theme and unanswered question here. (Though the supportive presence of voices from three of tech’s most powerful machine learning deploying platform giants suggests one answer.)

Scramble for public data

The report, entitled Machine learning: the power and promise of computers that learn by example, is the work of the Royal Society’s working group on machine learning, whose 15-strong membership includes employees of three companies currently deploying machine learning at scale: Demis Hassabis, the founder and CEO of Google DeepMind, along with DeepMind’s research scientist, Yee Whye Teh; Neil Lawrence, Amazon’s director of machine learning; and Zoubin Ghahramani, chief scientist at Uber.

The report’s top-line recommendations boil down the more fleshed out concerns in the meat of its chapters, and end up foregrounding encouragement at greater length than concern, as you might expect from a science academy — though the level of concern contained within its pages is notable nonetheless.

The report recommendations laud what is described as the U.K.’s “good progress” in increasing the accessibility of public sector data, urging “continued effort” towards “a new wave of ‘open data for machine learning’ by government to enhance the availability and usability of public sector data,” and calling for the government to “explore ways of catalysing the safe and rapid delivery” of new open standards for data which “reflect the needs of machine-driven analytical approaches.”

But an early glancing reference to “the value of strategic datasets” does get unpacked in more detail further into the report — with the recognition that early access to such valuable troves of publicly funded data could lock in commercial advantage. (Though you won’t find a single use of the word “monopoly” in the entire document.)

“It is necessary to recognise the value of some public sector data. While making such data open can bring benefits, considering how those benefits are distributed is important,” they write. “As machine learning becomes a more significant force, the ability to access data becomes more important, and those with access can attain a ‘first mover feedback’ advantage that can be significant. When there is such value at stake, it will be increasingly necessary to manage significant datasets or data sources strategically.”

There is no example of this kind of “first mover feedback advantage” set out in the report, but you could point to DeepMind’s data access partnerships with the U.K.’s National Health Service as a pertinent case study here. Not least as the original data-sharing arrangement that the Google-owned company reached with the Royal Free NHS Trust in London is controversial, having been agreed without patient knowledge or consent, and having scaled significantly in scope from its launch as a starter app hosting an NHS algorithm to (now) an ambitious plan to build a patient data API to broker third-party app makers’ access to NHS data. Also relevant, but unmentioned: the original DeepMind-Royal Free data-sharing agreement remains under investigation by U.K. data protection watchdogs. (It’s worth noting the Royal Society also has a separate working group on data governance — that’s due to publish a report this summer.)

Instead, the report flags up the value of NHS data — describing it as “one of the UK’s key data assets” — before going on to frame the notion of third-party access to U.K. citizens’ medical records as a case of “personal privacy vs public good,” suggesting that “appropriately controlled access mechanisms” could be developed to resolve what it dubs this “balancing act” (again, doing so without mentioning that DeepMind has already set itself the self-appointed task of developing a controlled access mechanism).

“If this balancing act is resolved, and if appropriately controlled access mechanisms can be developed, then there is huge potential for NHS data to be used in ways that will both improve the functioning of the NHS and improve healthcare delivery,” they write.

Yet exactly who stands to benefit economically from unlocking valuable healthcare insights from a publicly funded NHS is not discussed. Though common sense would tell you that Google/DeepMind believes there is a profitable business to be built off of free access to millions of NHS patient’s health data and the first mover advantage that gives them — including the chance to embed themselves into healthcare service delivery via control of an access infrastructure.

In an accompanying summary to the report, a pullquote from another member of the working group, Hermann Hauser, co-founder of Amadeus Capital Partners, talks excitedly about potential transformative opportunities for businesses making use of machine learning tech. “There are exciting opportunities for machine learning in business, and it will be an important tool to help organisations make use of their — and other — data,” he is quoted as saying. “To achieve these potentially significant economic benefits, businesses will need to be able to access the right skills at different levels.”

The phrase “economic benefits” is at least mentioned here. But the raison d’etre of investors is to achieve a good exit. And there has been a rash of exits of machine learning firms to big tech giants engaged in the war for AI talent. DeepMind selling to Google for more than $500 million in 2014 being just one example. So investors have their own dog in the fight for a less stringent public sector data governance regime — and still get to cash out if an AI startup they bet on sells to a tech giant, rather than scales into one itself.

Julia Powles, a tech law and policy researcher at Cornell Tech makes short shrift of the notion that lots of entrepreneurs stand to benefit if the public sector data floodgates are opened. “The idea that small guys can make use of their data is just a ruse. It’s only the big that will profit,” she tells TechCrunch.

Seismic shifts

Another portion of the report spends a lot of time apparently concerned with skills — discussing ways the government could encourage “a strong pipeline of practitioners in machine learning,” as it puts it — including urging it to make machine learning a priority area for additional PhD places, and to make near-term funding available for 1,000 extra PhDs (or more). Machine learning PhDs are of course top of the hiring tree for big tech giants that have the most cash to suck up these highly prized recruits, keeping them from being hired by startups, or indeed from starting their own competing businesses. So any increase at the top academic tier will be Google et al’s gain, first and foremost — more so if the public sector also paid to fund these extra PhD places.

The skills discussion (which includes suggestions to tweak school curriculum to include machine learning over the next five years) has to later be weighed against another portion of the report considering the potential impact of AI on jobs. Here the report cannot avoid the conclusion that machine learning will at the very least “change” work — and may well lead to seismic shifts in the employment prospects for large swathes of the workforce, which could also, the authors recognize, increase societal inequality. All of which does rather undermine the earlier suggestion that “everyone” in society will be able to upskill for a machine learning-driven future, given you can’t acquire skills for jobs that don’t exist… So the risk of AI generating a drastically asymmetric wealth and employment outcome is both firmly lodged in the report’s vision of future work — yet also kicked into a no man’s land of collective (i.e. zero ownership) responsibility.

“The potential benefits accruing from machine learning and their possibly significant consequences for employment need active management,” they write. “Without such stewardship, there is a risk that the benefits of machine learning may accrue to a small number of people, with others left behind, or otherwise disadvantaged by changes to society.

“While it is not yet clear how potential changes to the world of work might look, active consideration is needed now about how society can ensure that the increased use of machine learning is not accompanied by increased inequality and increased disaffection amongst certain groups. Thinking about how the benefits of machine learning can be shared by all is a key challenge for all of society.”

Ultimately, the report does call for “urgent consideration” to be given to what it describes as “the ‘careful stewardship’ needed over the next ten years to ensure that the dividends from machine learning… benefit all in UK society.” And it’s true to say, as we’ve said before, that policymakers and regulators do need to step up and start building frameworks and determining rules to ensure machine learning technologists do not have the chance to asset strip the public sector’s crown jewels before they’ve even been valued (not to mention leave future citizens unable to pay for the fancy services that will then be sold back to them, powered by machine learning models freely fatted up on publicly funded data).

But the suggested 10-year time frame seems disingenuous, to put it mildly. With — for instance — very large quantities of sensitive NHS data already flowing from the public sector into the hands of one of the world’s most market capitalized companies (Alphabet/Google/DeepMind) there would seem to be rather more short-term urgency for policymakers to address this issue — not leave it on the back burner for a decade or so. Indeed, parliamentarians have already been urging action on AI-related concerns like algorithmic accountability.

Perception and ethics

Public opinion is understandably a big preoccupation for the report authors — unsurprisingly so, given that a technology that potentially erodes people’s privacy and impacts their jobs risks being drastically unpopular. The Royal Society conducted a public poll on machine learning for the report, and say they found mixed views among Brits. Concerns apparently included “depersonalisation, or machine learning systems replacing valued human experiences; the potential impact of machine learning on employment; the potential for machine learning systems to cause harm, for example accidents in autonomous vehicles; and machine learning systems restricting choice, such as when directing consumers to specific products and services.”

“Ongoing public confidence will be central to realising the benefits that machine learning promises, and continued engagement between machine learning researchers and practitioners and the public will be important as the field develops,” they add.

The report suggests that large-scale machine learning research programs should include funding for “public engagement activities.” So there may at least, in the short term, be jobs for PR/marketing types to put a good spin on the “societal benefits of automation.” They also call for ethics to be taught as part of postgraduate study so that machine learning researchers are given “strong grounding in the broader societal implications of their work.” Which is a timely reminder that most of the machine learning tech already deployed in the wild, including commercially, has probably been engineered and implemented by minds lacking such a strong ethical grounding. (Not that we really need reminding.)

“Society needs to give urgent consideration to the ways in which the benefits from machine learning can be shared across society,” the report concludes. Which is another way of saying that machine learning risks concentrating wealth and power in the hands of a tiny number of massively powerful companies and individuals — at society’s expense. Whichever way you put it, there’s plenty of food for thought here.