Facebook data misuse scandal affects "substantially" more than 50M, claims Wylie

Chris Wylie, the former Cambridge Analytica employee turned whistleblower whose revelations about Facebook data being misused for political campaigning has wiped billions off the share value of the company in recent days and led to the FTC opening a fresh investigation, has suggested the scale of the data leak is substantially larger than has been reported so far.

Giving evidence today, to a UK parliamentary select committee that’s investigating the use of disinformation in political campaigning, Wylie said: “The 50 million number is what the media has felt safest to report — because of the documentation that they can rely on — but my recollection is that it was substantially higher than that. So my own view is it was much more than 50M.”

We’ve reached out to Facebook about Wylie’s claim — but at the time of writing the company had not provided a response.

“There were several iterations of the Facebook harvesting project,” Wylie also told the committee, fleshing out the process through which he says users’ data was obtained by CA. “It first started as a very small pilot — firstly to see, most simply, is this data matchable to an electoral register… We then scaled out slightly to make sure that [Cambridge University professor Alexsandr Kogan] could acquire data in the speed that he said he could [via a personality test app called thisisyourdigitallife deployed via Facebook’s platform]. So the first real pilot of it was a sample of 10,000 people who joined the app — that was in late May 2014.

“That project went really well and that’s when we signed a much larger contract with GSR [Kogan’s company] in the first week of June… 2014. Where the app went out and collected surveys and people joined the app throughout the summer of 2014.”

The personal information the app was able to obtain via Facebook formed the “foundational dataset” underpinning both CA and its targeting models, according to Wylie.

“This is what built the company,” he claimed. “This was the foundational dataset that then was modeled to create the algorithms.”

Facebook has previously confirmed 270,000 people downloaded Kogan’s app — a data harvesting route which, thanks to the lax structure of Facebook’s APIs at the time, enabled the foreign political consultancy firm to acquire information on more than 50 million Facebook users, according to the Observer, the vast majority of whom would have had no idea their data had been passed to CA because they were never personally asked to consent to it.

Instead, their friends were ‘consenting’ on their behalf — likely also without realizing.

Earlier this month, after the latest CA revelations broke, the DCMS committee asked Facebook founder Mark Zuckerberg to answer their questions in person but he has so far declined their summons. Though it has just been reported that he may finally appear before Congress to face questions about how users’ data has been so widely misused via his platform.

In a letter to the DCMS committee, dated yesterday, Facebook said it is working with regulators in different countries to confirm exactly how many local users have been affected by data leak.

It adds that around 1 per cent of the users whose data was illicitly obtained by CA were European Union users. This small proportion seems unsurprising, given CA was working for the Trump campaign — and therefore aiming to gather data on Americans for 2016 presidential campaign targeting purposes. EU citizens’ data wouldn’t have had any relevance to that.

“There will be two sets of data,” Facebook writes in its letter to the committee discussing the data passed to CA. “The first is people who downloaded the app, and the second is the number of friends of those people who have their privacy settings set in such a way that the app could see some of their data. This second figure will be much higher than the first and we will look to provide both broken down by country as soon as we can.”

Facebook’s privacy settings have caused major regulatory and legal headaches for the company over the years. In 2012, for example, Facebook settled with the FTC over charges it had deceived users by “telling them they could keep their information on Facebook private, and then repeatedly allowing it to be shared and made public”.

And in 2011 and 2012, following a legal complaint by European privacy campaigner and lawyer Max Schrems, Facebook was urged by the Irish Data Protection Commissioner to tighten app permissions to avoid exactly the kind of friends data leakage that has now scaled into this major privacy scandal.

Instead, Facebook put off tightening up API permissions until as late as mid 2015 — thereby giving CA a window of opportunity to pull massive amounts of Facebook user data ahead of the 2016 US presidential election.

When CA’s (currently suspended) CEO, Alexander Nix, appeared before the DCMS committee in February he was asked whether it worked with GSR and what use it made of GSR data. At that time Nix claimed CA had not used any GSR data.

The company is continuing to push this line, claiming in a series of tweets today that while it paid $500k for GSR data it subsequently “deleted the data”. It further claims it used alternative data sources and data sets to build its models. “Our algorithms and models bear no trace of it,” it has also tweeted re: the GSR data.

(Following the session, CA has also now put out a longer response statement, refuting multiple parts of Wylie’s testimony and claiming he has “misrepresented himself and the company”. In this it also claims: “Cambridge Analytica does not hold any GSR data or any data derived from GSR data. We have never shared the GSR data with Aggregate IQ [another alleged affiliate company], Palantir or any other entity. Cambridge Analytica did not use any GSR data in the work that we did for the Donald J. Trump for President campaign.”)

Asked by the committee about Nix’s earlier, contradicting testimony, Wylie wondered out loud why CA spent “the better part of $1M on GSR” — pointing also to “copious amounts of email” and other documents he says he has provided to the committee as additional evidence, including invoicing and “match rates on the data”.

“That’s just not true,” he asserted of CA’s claim not to have used GSR (and therefore Facebook) data.

Kogan himself has previously claimed he was unaware exactly what CA wanted to use the data for. “I knew it was for political consulting but beyond that no idea,” he told Anderson Cooper in a TV interview broadcast on March 21, claiming also that he did not know that CA was working for Trump or whether they even used the data his app had gathered.

Kogan also suggested the data he had been able to gather was not very accurate at an individual level — claiming it would only be useful in aggregate to, for example, “understand the personality of New Yorkers”.

Wylie was asked by the committee how the data was used by CA. Giving an example he says the company’s approach was to target different people for advertising based on their “dispositional attributes and personality traits” — traits it sought to predict via patterns in the data.

He said:

For example, if you are able to create profiling algorithms that can predict certain traits — so let’s say a high degree of openness and a high degree of neuroticism — and when you look at that profiles that’s the profile of a person who’s more prone towards conspiratorial thinking, for example, they’re open enough to kind of connect to things that may not really seem reasonable to your average person. And they’re anxious enough and impulse enough to start clicking and reading and looking at things — and so if you can create a psychological profile of a type of person who is more prone to adopting certain forms of ideas, conspiracies for example, you can identify what that person looks like in data terms. You can then go out and predict how likely somebody is going to be to adopt more conspiratorial messaging. And then advertise or target them with blogs or websites or various — what everyone now calls fake news — so that they start seeing all of these ideas, or all of these stories around them in their digital environment. They don’t see it when they watch CNN or NBC or BBC. And they start to go well why is that everyone’s talking about this online? Why is it that I’m seeing everything here but the mainstream media isn’t talking about [it]… Not everyone’s going to adopt that — so that advantage of using profiling is you can find the specific group of people who are more prone to adopting that idea as your early adopters… So if you can find those people in your datasets because you know what they look like in terms of data you can catalyze a trend over time. But you first need to find what those people look like.

“That was the basis of a lot of our research [at CA and sister company SCL],” he added. “How far can we go with certain types of people. And who is it that we would need to target with what types of messaging.”

Wylie told the committee that Kogan’s company was set up exclusively for the purposes of obtaining data for CA, and said the firm chose to work with Kogan because another professor it had approached first had asked for a substantial payment up front and a 50% equity share — whereas he had agreed to work on the project to obtain the data first, and consider commercial terms later.

“The deal was that [Kogan] could keep all the data and do research or whatever he wanted to do with is and so for him it was appealing because you had a company that was the equivalent of no academic grant could compete with the amount of money that we could spend on it, and also we didn’t have to go through all the compliance stuff,” added Wylie. “So we could literally just start next week and pay for whatever you want. So my impression at the time was that for an academic that would be quite appealing.”

“All kinds of people [had] access to the data”

Another claim made by Wylie during the session was that the secretive US big data firm Palantir helped CA build models off of the Facebook data — although he also said there was no formal contract in place between the two firms.

Wylie said Palantir was introduced to CA’s Nix by Sophie Schmidt, Google chairman Eric Schmidt’s daughter, during an internship at CA.

“We actually had several meetings with Palantir whilst I was there,” claimed Wylie. “And some of the documentation that I’ve also provided to the committee… [shows] there were senior Palantir employees that were also working on the Facebook data.”

The VC-backed firm is known for providing government, finance, healthcare and other organizations with analytics, security and other data management solutions.

“That was not an official contract between Palantir and Cambridge Analytica but there were Palantir staff who would come into the office and work on the data,” Wylie added. “And we would go and meet with Palantir staff at Palantir. So, just to clarify, Palantir didn’t officially contract with Cambridge Analytica. But there were Palantir staff who helped build the models that we were working on.”

Contacted for comment on this allegation a Palantir spokesperson refuted it entirely — providing TechCrunch with this emailed statement: “Palantir has never had a relationship with Cambridge Analytica nor have we ever worked on any Cambridge Analytica data.”

The committee went on to ask Wylie why he was coming forward to tell this story now, given his involvement in building the targeting technologies — and therefore also his interests in the related political campaigns.

Wylie responded by saying that he had grown increasingly uncomfortable with CA during his time working there and with the methods being used.

“Nothing good has come from Cambridge Analytica,” he added. “It’s not a legitimate business.”

In a statement put out on its Twitter yesterday, CA’s acting CEO Alex Tayler sought to distance the firm from Wylie and play down his role there, claiming: “The source of allegations is not a whistleblower or a founder of the company. He was at the company for less than a year, after which he was made the subject of restraining undertakings to prevent his misuse of the company’s intellectual property.”

Asked whether he’s received any legal threats since making his allegations public, Wylie said the most legal pushback he’s received so far has come from Facebook, rather than CA.

“It’s Facebook who’s most upset about this story,” he told the committee. “They’ve sent some fairly intimidating legal correspondence. They haven’t actually taken action on that… They’ve gone silent, they won’t talk to me anymore.

“But I do anticipate some robust pushback from Cambridge Analytica because this is sort of an existential crisis for them,” he added. “But I think that I have a fairly robust public interest defense to breaking that NDA and that undertaking of confidentiality [that he previously signed with CA].”

The committee also pressed Wylie on whether he himself had had access to the Facebook data he claims CA used to build its targeting models. Wylie said that he had, though he claims he deleted his copy of the data “some time in 2015”.

During the testimony Wylie also suggested Facebook might have found out about the GSL data harvesting project as early as July 2014 — because he says Kogan told him, around that time, that he had spoken to Facebook engineers after his app’s data collection rate had been throttled by the platform.

“He told me that he had a conversation with some engineers at Facebook,” said Wylie. “So Facebook would have known from that moment about the project because he had a conversation with Facebook’s engineers — or at least that’s what he told me… Facebook’s account of it is that they had no idea until the Guardian first reported it at the end of 2015 — and then they decided to send out letters. They sent letters to me in August 2016 asking do you know where this data might be, or was it deleted?

“It’s interesting that… the date of the letter is the same month that Cambridge Analytica officially joined the Trump campaign. So I’m not sure if Facebook was genuinely concerned about the data or just the optics of y’know now this firm is not just some random firm in Britain, it’s now working for a presidential campaign.”

We also asked Facebook if it had any general response to Wylie’s testimony but at the time of writing the company had not responded to this request for comment either.

Did Facebook make any efforts to retrieve or delete data, the committee also asked Wylie. “No they didn’t,” he replied. “Not to my knowledge. They certainly didn’t with me — until after I went public and then they made me suspect number one despite the fact the ICO [UK’s Information Commissioner’s Office] wrote to me and to Facebook saying that no I’ve actually given over everything to the authorities.”

“I suspect that when Facebook looked at what happened in 2016… they went if we make a big deal of this this might be optically not the best thing to make a big fuss about,” he said. “So I don’t think they pushed it in part because if you want to really investigate a large data breach that’s going to get out and that might cause problems. So my impression was they wanted to push it under the rug.”

“All kinds of people [had] access to the data,” he added. “It was everywhere.”