Privacy

ChatGPT-maker OpenAI accused of string of data protection breaches in GDPR complaint filed by privacy researcher

Comment

ChatGPT welcome screen
Image Credits: Leon Neal / Getty Images

Questions about ChatGPT-maker OpenAI’s ability to comply with European privacy rules are in the frame again after a detailed complaint was filed with the Polish data protection authority yesterday.

The complaint, which TechCrunch has reviewed, alleges the U.S. based AI giant is in breach of the bloc’s General Data Protection Regulation (GDPR) — across a sweep of dimensions: Lawful basis, transparency, fairness, data access rights, and privacy by design are all areas it argues OpenAI is infringing EU privacy rules. (Aka, Articles 5(1)(a), 12, 15, 16 and 25(1) of the GDPR).

Indeed, the complaint frames the novel generative AI technology and its maker’s approach to developing and operating the viral tool as essentially a systematic breach of the pan-EU regime. Another suggestion, therefore, is that OpenAI has overlooked another requirement in the GDPR to undertake prior consultation with regulators (Article 36) — since, if it had conducted a proactive assessment which identified high risks to people’s rights unless mitigating measures were applied it should have given pause for thought. Yet OpenAI apparently rolled ahead and launched ChatGPT in Europe without engaging with local regulators which could have ensured it avoided falling foul of the bloc’s privacy rulebook.

This is not the first GDPR concern lobbed in ChatGPT’s direction, of course. Italy’s privacy watchdog, the Garante, generated headlines earlier this year after it ordered OpenAI to stop processing data locally — directing the US-based company to tackle a preliminary list of problems it identified in areas including lawful basis, information disclosures, user controls and child safety.

ChatGPT was able to resume offering a service in Italy fairly quickly after it tweaked its presentation. But the Italian DPA’s investigation continues and it remains to be seen what compliance conclusions may emerge once that assessment has been completed. Other EU DPAs are also probing ChatGPT. While, in April, the bloc’s data protection authorities formed a task force, via the European Data Protection Board (EDPB), to collectively consider how they should approach regulating the fast-developing tech.

That effort is ongoing — and it’s by no means certain a harmonized approach to oversight of ChatGPT and other AI chatbots will emerge — but, whatever happens there, the GDPR is still law and still in force. So anyone in the EU who feels their rights are being trampled by Big AI grabbing their data for training models that may spit out falsities about them can raise concerns with their local DPA and press for regulators to investigate, as is happening here.

OpenAI is not main established in any EU Member State for the purpose of GDPR oversight, which means it remains exposed to regulatory risk in this area across the bloc. So could face outreach from DPAs acting on complaints from individuals anywhere in the bloc.

Confirmed violations of the GDPR, meanwhile, can attract penalties as high as 4% of global annual turnover. DPAs’ corrective orders may also end up reworking how technologies function if they wish to continue operating inside the bloc.

Sam Altman’s big European tour

Complaint of unlawful processing for AI training

The 17-page complaint filed yesterday with the Polish DPA is the work of Lukasz Olejnik, a security and privacy researcher, who is being represented for the complaint by Warsaw-based law firm, GP Partners.

Olejnik tells TechCrunch he became concerned after he used ChatGPT to generate a biography of himself and found it produced a text that contained some errors. He sought to contact OpenAI, towards the end of March, to point out the errors and ask for the inaccurate information about him to be corrected. He also asked it to provide him with a bundle of information that the GDPR empowers individuals to get from entities processing their data when the information has been obtained from somewhere other than themselves, as was the case here.

Per the complaint, a series of email exchanges took place between Olejnik and OpenAI between March and June of this year. And while OpenAI responded by providing some information in response to the Subject Access Request (SAR) Olejnik’s complaint argues it failed to produce all the information it must under the law — including, notably, omitting information about its processing of personal data for AI model training. 

Under the GDPR, for personal data processing to be lawful the data controller needs a valid legal basis — which must be transparently communicated. So obfuscation is not a good compliance strategy. Also indeed because the regulation attaches the principle of fairness to the lawfulness of processing, which means anyone playing tricks to try to conceal the true extent of personal data processing is going to fall foul of the law too.

Olejnik’s complaint therefore asserts OpenAI breached Article 5(1)(a). Or, more simply, he argues the company processed his data “unlawfully, unfairly, and in a non-transparent manner”. “From the facts of the case, it appears that OpenAI systemically ignores the provisions of the GDPR regarding the processing of data for the purposes of training models within ChatGPT, a result of which, among other things, was that Mr. Łukasz Olejnik was not properly informed about the processing of his personal data,” the complaint notes. 

It also accuses OpenAI of acting in an “untrustworthy, dishonest, and perhaps unconscientious manner” by failing to be able to comprehensively detail how it has processed people’s data.

Although OpenAI indicates that the data used to train the [AI] models includes personal data, OpenAI does not actually provide any information about the processing operations involving this data. OpenAI thus violates a fundamental element of the right under Article 15 GDPR, i.e., the obligation to confirm that personal data is being processed,” runs another relevant chunk of the complaint (which has been translated into English from Polish using machine translation). 

“Notably, OpenAI did not include the processing of personal data in connection with model training in the information on categories of personal data or categories of data recipients. Providing a copy of the data also did not include personal data processed for training language models. As it seems, the fact of processing personal data for model training OpenAI hides or at least camouflages intentionally. This is also apparent from OpenAI’s Privacy Policy, which omits in the substantive part the processes involved in processing personal data for training language models.

“OpenAI reports that it does not use so-called ‘training’ data to identify individuals or remember their information, and is working to reduce the amount of personal data processed in the ‘training’ dataset. Although these mechanisms positively affect the level of protection of personal data and comply with the principle of minimization (Article 5(1)(c) of the GDPR), their application does not change the fact that ‘training’ data are processed and include personal data. The provisions of GDPR apply to the processing operations of such data, including the obligation to grant the data subject access to the data and provide the information indicated in Article 15(1) of GDPR.”

It’s a matter of record that OpenAI did not ask individuals whose personal data it may have processed as training data when it was developing its AI chatbot for their permission to use their information for that — nor did it inform the likely millions (or even billions) of people whose information it ingested in order to develop a commercial generative AI tool — which likely explains its lack of transparency when asked to produce information about this aspect of its data processing operations via Olejnik’s SAR.

However, as noted above, the GDPR requires not only a lawful basis for processing people’s data but transparency and fairness vis-a-vis any such operations. So OpenAI appears to have got itself into a triple bind here. Although it remains to be seen how EU regulators will act on such complaints as they weigh how to respond to generative AI chatbots.

Right to correct personal data ignored

Another aspect of Olejnik’s beef with OpenAI fixes on errors ChatGPT generated about him when asked to produce a biography — and its apparent inability to rectify these inaccuracies when asked. Instead of correcting falsehoods its tool generated about him, he says OpenAI initially responded to his ask by blocking requests made to ChatGPT that referenced him — something he had not asked for.

Subsequently it told him it could not correct the errors. Yet the GDPR provides individuals with a right to rectification of their personal data.

In the case of OpenAI and the processing of data to train models, this principle [rectification of personal data] is completely ignored in practice,” the complaint asserts. “This is evidenced by OpenAI’s response to Mr. Łukasz Olejnik’s request, according to which OpenAI was unable to correct the processed data. OpenAI’s systemic inability to correct data is assumed by OpenAI as part of ChatGPT’s operating model.”

Discussing disclosures related to this aspect of its operation contained in OpenAI’s privacy policy, the complaint goes on to argue: “Given the general and vague description of ChatGPT’s data validity mechanisms, it is highly likely that the inability to correct data is a systemic phenomenon in OpenAI’s data processing, and not just in limited cases.”

It further suggests there may be “reasonable doubts about the overall compliance with data protection regulations of a tool, an essential element of which is the systemic inaccuracy of the processed data”, adding: “These doubts are reinforced by the scale of ChatGPT’s processed data and the scale of potential recipients of personal data, which affect the risks to rights and freedoms associated with personal data inaccuracy.”

The complaint goes on to argue OpenAI “should develop and implement a data rectification mechanism based on an appropriate filter/module that would verify and correct content generated by ChatGPT (e.g., based on a database of corrected results)”, suggesting: “It is reasonable in the context of the scope of the obligation to ensure data accuracy to expect OpenAI to correct at least data reported or flagged by users as incorrect.”

We believe that it is possible for OpenAI to develop adequate and GDPR-compliant mechanisms for correcting inaccurate data (it is already possible to block the generation of certain content as a result of a blockade imposed by OpenAI),” it adds. “However, if, in OpenAI’s opinion, it is not possible to develop such mechanisms — it would be necessary to consult the issue with the relevant supervisory authorities, including, for example, through the prior consultation procedure described in Article 36 of GDPR.”

Data protection incompatibility by design?

The complaint also seeks to spotlight what it views as a total violation of the GDPR’s principle of data protection by design and default.

“The way the ChatGPT tool was designed, taking into account also the violations described [earlier] in the complaint (in particular, the inability to exercise the right to rectify data, the omission of data processing operations for training GPT models) — contradicts all the indicated assumptions of the principle of data protection by design,” it argues. “In practice, in the case of data processing by OpenAI, there is testing of the ChatGPT tool using personal data, not in the design phase, but in the production environment (i.e., after the tool is made available to users).

“OpenAI seems to accept that the ChatGPT tool model that has been developed is simply incompatible with the provisions of GDPR, and it agrees to this state of affairs. This shows a complete disregard for the goals behind the principle of data protection by design.”

We’ve asked OpenAI to respond to the complaint’s claims that its AI chatbot violates the GDPR and also to confirm whether or not it produced a data protection impact assessment prior to launching ChatGPT.

Additionally, we’ve asked it to explain why it did not seek prior consultation with EU regulators to get help on how to develop such a high risk technology in a way that could have mitigated GDPR risks. At the time of writing it had not responded to our questions but we’ll update this report if we get a response.

We also reached out to the Polish DPA, the UODO, about the complaint. A spokesperson for the UODO confirmed receipt of the complaint — which they said it is now analyzing to decide on further actions. They also confirmed it is the first such complaint the authority has received regarding ChatGPT. And said they have not previously had any correspondence with OpenAI regarding ChatGPT’s GDPR compliance.

“The [UODO] has been looking at generative AI tools for a long time in light of the requirements of the GDPR regarding lawful, fair and transparent processing of personal data and data access rights,” the spokesperson also told us. “The authority is wondering how artificial intelligence systems should be designed in accordance with the GDPR, and how to determine the relationship between the GDPR and the [EU] AI Act. New legal regulations, such as the AI Act, which limit the impact of AI on the analysis of biometric data or human emotional states allow us to look into the future with hope.

“We expect that the AI Act will allow us to protect fundamental rights against inappropriately functioning AI algorithms. Nevertheless, at the same time, the Personal Data Protection Office is aware that due to the possibility of automatic decision-making based on data analysis, there may be a risk of inappropriate use of AI, e.g. to manipulate public opinion, spread false information or discriminate against certain social groups. These are the challenges that the [UODO] has to face. The Office also reminds that customers must be informed how their data is used and processed by AI and be able to consent to their use.”

The spokesman also emphasized the importance of conducting a data protection impact assessment, stressing that “particular emphasis should be placed on DPIA”. “A personal data controller who uses tools such as ChatGPT should apply a risk-based approach and conduct a data protection impact assessment before starting to process data using artificial intelligence,” they added.

They further confirmed that the authority has joined the dedicated EDPB task force looking at ChatGPT’s GDPR compliance, saying the effort aims to “foster cooperation and exchange information on possible enforcement actions conducted by data protection authorities regarding the ChatGPT service and provide the platform for its joint analysis on the EU level”. 

Discussing their own expectations for the complaint, Olejnik’s lawyer, Maciej Gawronski, suggests the length of time it could take the Polish regulator to investigate could be “anything from six months to two years”.

“Provided UODO confirms violation of the GDPR we would expect UODO to primarily order OpenAI to exercise Mr Olejnik’s rights,” he told us. “In addition, as we argue that some of OpenAI’s violations may be systemic, we hope the DPA will investigate the processing thoroughly and, if justified, order OpenAI to act in compliance with the GDPR so that data processing operations within ChatGPT are lawful in a more universal perspective.”

Gawronski also takes the view that OpenAI has failed to apply Article 36 of the GDPR — since it did not engage in a process of prior consultation with the UODO or any other European DPA before launching ChatGPT — adding: “We would expect UODO to force OpenAI into engaging into a similar process now.”

In another step, the complaint urges the Polish regulator to require OpenAI to submit a data protection impact assessment (DPIA) with details of its processing of personal data for purposes related to ChatGPT — describing this document, which is a standard feature of data protection compliance in Europe, as an “important element” for assessing whether the tool is compliant with the GDPR. 

For his part, Olejnik says his hope in bringing the complaint against OpenAI and ChatGPT is that he will be able to properly exercise all the GDPR rights he has found himself unable to so far.

“During this journey I felt kind of like Josef K, in kafka’s The Trial,” he told us. “Fortunately, in Europe there’s a system in place to avoid such a feeling. I trust that the GDPR process does work!”

This report was updated with comment from the Polish data protection authority 

Italy gives OpenAI initial to-do list for lifting ChatGPT suspension order

ChatGPT resumes service in Italy after adding privacy disclosures and controls

More TechCrunch

Some Indian government websites have allowed scammers to plant advertisements capable of redirecting visitors to online betting platforms. TechCrunch discovered around four dozen “gov.in” website links associated with Indian states,…

Scammers found planting online betting ads on Indian government websites

Around 550 employees across autonomous vehicle company Motional have been laid off, according to information taken from WARN notice filings and sources at the company.  Earlier this week, TechCrunch reported…

Motional cut about 550 employees, around 40%, in recent restructuring, sources say

The deck included some redacted numbers, but there was still enough data to get a good picture.

Pitch Deck Teardown: Cloudsmith’s $15M Series A deck

The company is describing the event as “a chance to demo some ChatGPT and GPT-4 updates.”

OpenAI’s ChatGPT announcement: What we know so far

Unlike ChatGPT, Claude did not become a new App Store hit.

Anthropic’s Claude sees tepid reception on iOS compared with ChatGPT’s debut

Welcome to Startups Weekly — Haje‘s weekly recap of everything you can’t miss from the world of startups. Sign up here to get it in your inbox every Friday. Look,…

Startups Weekly: Trouble in EV land and Peloton is circling the drain

Scarcely five months after its founding, hard tech startup Layup Parts has landed a $9 million round of financing led by Founders Fund to transform composites manufacturing. Lux Capital and Haystack…

Founders Fund leads financing of composites startup Layup Parts

AI startup Anthropic is changing its policies to allow minors to use its generative AI systems — in certain circumstances, at least.  Announced in a post on the company’s official…

Anthropic now lets kids use its AI tech — within limits

Zeekr’s market hype is noteworthy and may indicate that investors see value in the high-quality, low-price offerings of Chinese automakers.

The buzziest EV IPO of the year is a Chinese automaker

Venture capital has been hit hard by souring macroeconomic conditions over the past few years and it’s not yet clear how the market downturn affected VC fund performance. But recent…

VC fund performance is down sharply — but it may have already hit its lowest point

The person who claims to have 49 million Dell customer records told TechCrunch that he brute-forced an online company portal and scraped customer data, including physical addresses, directly from Dell’s…

Threat actor says he scraped 49M Dell customer addresses before the company found out

The social network has announced an updated version of its app that lets you offer feedback about its algorithmic feed so you can better customize it.

Bluesky now lets you personalize main Discover feed using new controls

Microsoft will launch its own mobile game store in July, the company announced at the Bloomberg Technology Summit on Thursday. Xbox president Sarah Bond shared that the company plans to…

Microsoft is launching its mobile game store in July

Smart ring maker Oura is launching two new features focused on heart health, the company announced on Friday. The first claims to help users get an idea of their cardiovascular…

Oura launches two new heart health features

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI considers allowing AI porn

Garena is quietly developing new India-themed games even though Free Fire, its biggest title, has still not made a comeback to the country.

Garena is quietly making India-themed games even as Free Fire’s relaunch remains doubtful

The U.S.’ NHTSA has opened a fourth investigation into the Fisker Ocean SUV, spurred by multiple claims of “inadvertent Automatic Emergency Braking.”

Fisker Ocean faces fourth federal safety probe

CoreWeave has formally opened an office in London that will serve as its European headquarters and home to two new data centers.

CoreWeave, a $19B AI compute provider, opens European HQ in London with plans for 2 UK data centers

The Series C funding, which brings its total raise to around $95 million, will go toward mass production of the startup’s inaugural products

AI chip startup DEEPX secures $80M Series C at a $529M valuation 

A dust-up between Evolve Bank & Trust, Mercury and Synapse has led TabaPay to abandon its acquisition plans of troubled banking-as-a-service startup Synapse.

Infighting among fintech players has caused TabaPay to ‘pull out’ from buying bankrupt Synapse

The problem is not the media, but the message.

Apple’s ‘Crush’ ad is disgusting

The Twitter for Android client was “a demo app that Google had created and gave to us,” says Particle co-founder and ex-Twitter employee Sara Beykpour.

Google built some of the first social apps for Android, including Twitter and others

WhatsApp is updating its mobile apps for a fresh and more streamlined look, while also introducing a new “darker dark mode,” the company announced on Thursday. The messaging app says…

WhatsApp’s latest update streamlines navigation and adds a ‘darker dark mode’

Plinky lets you solve the problem of saving and organizing links from anywhere with a focus on simplicity and customization.

Plinky is an app for you to collect and organize links easily

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Google I/O 2024: How to watch

For cancer patients, medicines administered in clinical trials can help save or extend lives. But despite thousands of trials in the United States each year, only 3% to 5% of…

Triomics raises $15M Series A to automate cancer clinical trials matching

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Tap, tap.…

Tesla drives Luminar lidar sales and Motional pauses robotaxi plans

The newly announced “Public Content Policy” will now join Reddit’s existing privacy policy and content policy to guide how Reddit’s data is being accessed and used by commercial entities and…

Reddit locks down its public data in new content policy, says use now requires a contract

Eva Ho plans to step away from her position as general partner at Fika Ventures, the Los Angeles-based seed firm she co-founded in 2016. Fika told LPs of Ho’s intention…

Fika Ventures co-founder Eva Ho will step back from the firm after its current fund is deployed

In a post on Werner Vogels’ personal blog, he details Distill, an open-source app he built to transcribe and summarize conference calls.

Amazon’s CTO built a meeting-summarizing app for some reason