Privacy

ChatGPT-maker OpenAI accused of string of data protection breaches in GDPR complaint filed by privacy researcher

Comment

ChatGPT welcome screen
Image Credits: Leon Neal / Getty Images

Questions about ChatGPT-maker OpenAI’s ability to comply with European privacy rules are in the frame again after a detailed complaint was filed with the Polish data protection authority yesterday.

The complaint, which TechCrunch has reviewed, alleges the U.S. based AI giant is in breach of the bloc’s General Data Protection Regulation (GDPR) — across a sweep of dimensions: Lawful basis, transparency, fairness, data access rights, and privacy by design are all areas it argues OpenAI is infringing EU privacy rules. (Aka, Articles 5(1)(a), 12, 15, 16 and 25(1) of the GDPR).

Indeed, the complaint frames the novel generative AI technology and its maker’s approach to developing and operating the viral tool as essentially a systematic breach of the pan-EU regime. Another suggestion, therefore, is that OpenAI has overlooked another requirement in the GDPR to undertake prior consultation with regulators (Article 36) — since, if it had conducted a proactive assessment which identified high risks to people’s rights unless mitigating measures were applied it should have given pause for thought. Yet OpenAI apparently rolled ahead and launched ChatGPT in Europe without engaging with local regulators which could have ensured it avoided falling foul of the bloc’s privacy rulebook.

This is not the first GDPR concern lobbed in ChatGPT’s direction, of course. Italy’s privacy watchdog, the Garante, generated headlines earlier this year after it ordered OpenAI to stop processing data locally — directing the US-based company to tackle a preliminary list of problems it identified in areas including lawful basis, information disclosures, user controls and child safety.

ChatGPT was able to resume offering a service in Italy fairly quickly after it tweaked its presentation. But the Italian DPA’s investigation continues and it remains to be seen what compliance conclusions may emerge once that assessment has been completed. Other EU DPAs are also probing ChatGPT. While, in April, the bloc’s data protection authorities formed a task force, via the European Data Protection Board (EDPB), to collectively consider how they should approach regulating the fast-developing tech.

That effort is ongoing — and it’s by no means certain a harmonized approach to oversight of ChatGPT and other AI chatbots will emerge — but, whatever happens there, the GDPR is still law and still in force. So anyone in the EU who feels their rights are being trampled by Big AI grabbing their data for training models that may spit out falsities about them can raise concerns with their local DPA and press for regulators to investigate, as is happening here.

OpenAI is not main established in any EU Member State for the purpose of GDPR oversight, which means it remains exposed to regulatory risk in this area across the bloc. So could face outreach from DPAs acting on complaints from individuals anywhere in the bloc.

Confirmed violations of the GDPR, meanwhile, can attract penalties as high as 4% of global annual turnover. DPAs’ corrective orders may also end up reworking how technologies function if they wish to continue operating inside the bloc.

Sam Altman’s big European tour

Complaint of unlawful processing for AI training

The 17-page complaint filed yesterday with the Polish DPA is the work of Lukasz Olejnik, a security and privacy researcher, who is being represented for the complaint by Warsaw-based law firm, GP Partners.

Olejnik tells TechCrunch he became concerned after he used ChatGPT to generate a biography of himself and found it produced a text that contained some errors. He sought to contact OpenAI, towards the end of March, to point out the errors and ask for the inaccurate information about him to be corrected. He also asked it to provide him with a bundle of information that the GDPR empowers individuals to get from entities processing their data when the information has been obtained from somewhere other than themselves, as was the case here.

Per the complaint, a series of email exchanges took place between Olejnik and OpenAI between March and June of this year. And while OpenAI responded by providing some information in response to the Subject Access Request (SAR) Olejnik’s complaint argues it failed to produce all the information it must under the law — including, notably, omitting information about its processing of personal data for AI model training. 

Under the GDPR, for personal data processing to be lawful the data controller needs a valid legal basis — which must be transparently communicated. So obfuscation is not a good compliance strategy. Also indeed because the regulation attaches the principle of fairness to the lawfulness of processing, which means anyone playing tricks to try to conceal the true extent of personal data processing is going to fall foul of the law too.

Olejnik’s complaint therefore asserts OpenAI breached Article 5(1)(a). Or, more simply, he argues the company processed his data “unlawfully, unfairly, and in a non-transparent manner”. “From the facts of the case, it appears that OpenAI systemically ignores the provisions of the GDPR regarding the processing of data for the purposes of training models within ChatGPT, a result of which, among other things, was that Mr. Łukasz Olejnik was not properly informed about the processing of his personal data,” the complaint notes. 

It also accuses OpenAI of acting in an “untrustworthy, dishonest, and perhaps unconscientious manner” by failing to be able to comprehensively detail how it has processed people’s data.

Although OpenAI indicates that the data used to train the [AI] models includes personal data, OpenAI does not actually provide any information about the processing operations involving this data. OpenAI thus violates a fundamental element of the right under Article 15 GDPR, i.e., the obligation to confirm that personal data is being processed,” runs another relevant chunk of the complaint (which has been translated into English from Polish using machine translation). 

“Notably, OpenAI did not include the processing of personal data in connection with model training in the information on categories of personal data or categories of data recipients. Providing a copy of the data also did not include personal data processed for training language models. As it seems, the fact of processing personal data for model training OpenAI hides or at least camouflages intentionally. This is also apparent from OpenAI’s Privacy Policy, which omits in the substantive part the processes involved in processing personal data for training language models.

“OpenAI reports that it does not use so-called ‘training’ data to identify individuals or remember their information, and is working to reduce the amount of personal data processed in the ‘training’ dataset. Although these mechanisms positively affect the level of protection of personal data and comply with the principle of minimization (Article 5(1)(c) of the GDPR), their application does not change the fact that ‘training’ data are processed and include personal data. The provisions of GDPR apply to the processing operations of such data, including the obligation to grant the data subject access to the data and provide the information indicated in Article 15(1) of GDPR.”

It’s a matter of record that OpenAI did not ask individuals whose personal data it may have processed as training data when it was developing its AI chatbot for their permission to use their information for that — nor did it inform the likely millions (or even billions) of people whose information it ingested in order to develop a commercial generative AI tool — which likely explains its lack of transparency when asked to produce information about this aspect of its data processing operations via Olejnik’s SAR.

However, as noted above, the GDPR requires not only a lawful basis for processing people’s data but transparency and fairness vis-a-vis any such operations. So OpenAI appears to have got itself into a triple bind here. Although it remains to be seen how EU regulators will act on such complaints as they weigh how to respond to generative AI chatbots.

Right to correct personal data ignored

Another aspect of Olejnik’s beef with OpenAI fixes on errors ChatGPT generated about him when asked to produce a biography — and its apparent inability to rectify these inaccuracies when asked. Instead of correcting falsehoods its tool generated about him, he says OpenAI initially responded to his ask by blocking requests made to ChatGPT that referenced him — something he had not asked for.

Subsequently it told him it could not correct the errors. Yet the GDPR provides individuals with a right to rectification of their personal data.

In the case of OpenAI and the processing of data to train models, this principle [rectification of personal data] is completely ignored in practice,” the complaint asserts. “This is evidenced by OpenAI’s response to Mr. Łukasz Olejnik’s request, according to which OpenAI was unable to correct the processed data. OpenAI’s systemic inability to correct data is assumed by OpenAI as part of ChatGPT’s operating model.”

Discussing disclosures related to this aspect of its operation contained in OpenAI’s privacy policy, the complaint goes on to argue: “Given the general and vague description of ChatGPT’s data validity mechanisms, it is highly likely that the inability to correct data is a systemic phenomenon in OpenAI’s data processing, and not just in limited cases.”

It further suggests there may be “reasonable doubts about the overall compliance with data protection regulations of a tool, an essential element of which is the systemic inaccuracy of the processed data”, adding: “These doubts are reinforced by the scale of ChatGPT’s processed data and the scale of potential recipients of personal data, which affect the risks to rights and freedoms associated with personal data inaccuracy.”

The complaint goes on to argue OpenAI “should develop and implement a data rectification mechanism based on an appropriate filter/module that would verify and correct content generated by ChatGPT (e.g., based on a database of corrected results)”, suggesting: “It is reasonable in the context of the scope of the obligation to ensure data accuracy to expect OpenAI to correct at least data reported or flagged by users as incorrect.”

We believe that it is possible for OpenAI to develop adequate and GDPR-compliant mechanisms for correcting inaccurate data (it is already possible to block the generation of certain content as a result of a blockade imposed by OpenAI),” it adds. “However, if, in OpenAI’s opinion, it is not possible to develop such mechanisms — it would be necessary to consult the issue with the relevant supervisory authorities, including, for example, through the prior consultation procedure described in Article 36 of GDPR.”

Data protection incompatibility by design?

The complaint also seeks to spotlight what it views as a total violation of the GDPR’s principle of data protection by design and default.

“The way the ChatGPT tool was designed, taking into account also the violations described [earlier] in the complaint (in particular, the inability to exercise the right to rectify data, the omission of data processing operations for training GPT models) — contradicts all the indicated assumptions of the principle of data protection by design,” it argues. “In practice, in the case of data processing by OpenAI, there is testing of the ChatGPT tool using personal data, not in the design phase, but in the production environment (i.e., after the tool is made available to users).

“OpenAI seems to accept that the ChatGPT tool model that has been developed is simply incompatible with the provisions of GDPR, and it agrees to this state of affairs. This shows a complete disregard for the goals behind the principle of data protection by design.”

We’ve asked OpenAI to respond to the complaint’s claims that its AI chatbot violates the GDPR and also to confirm whether or not it produced a data protection impact assessment prior to launching ChatGPT.

Additionally, we’ve asked it to explain why it did not seek prior consultation with EU regulators to get help on how to develop such a high risk technology in a way that could have mitigated GDPR risks. At the time of writing it had not responded to our questions but we’ll update this report if we get a response.

We also reached out to the Polish DPA, the UODO, about the complaint. A spokesperson for the UODO confirmed receipt of the complaint — which they said it is now analyzing to decide on further actions. They also confirmed it is the first such complaint the authority has received regarding ChatGPT. And said they have not previously had any correspondence with OpenAI regarding ChatGPT’s GDPR compliance.

“The [UODO] has been looking at generative AI tools for a long time in light of the requirements of the GDPR regarding lawful, fair and transparent processing of personal data and data access rights,” the spokesperson also told us. “The authority is wondering how artificial intelligence systems should be designed in accordance with the GDPR, and how to determine the relationship between the GDPR and the [EU] AI Act. New legal regulations, such as the AI Act, which limit the impact of AI on the analysis of biometric data or human emotional states allow us to look into the future with hope.

“We expect that the AI Act will allow us to protect fundamental rights against inappropriately functioning AI algorithms. Nevertheless, at the same time, the Personal Data Protection Office is aware that due to the possibility of automatic decision-making based on data analysis, there may be a risk of inappropriate use of AI, e.g. to manipulate public opinion, spread false information or discriminate against certain social groups. These are the challenges that the [UODO] has to face. The Office also reminds that customers must be informed how their data is used and processed by AI and be able to consent to their use.”

The spokesman also emphasized the importance of conducting a data protection impact assessment, stressing that “particular emphasis should be placed on DPIA”. “A personal data controller who uses tools such as ChatGPT should apply a risk-based approach and conduct a data protection impact assessment before starting to process data using artificial intelligence,” they added.

They further confirmed that the authority has joined the dedicated EDPB task force looking at ChatGPT’s GDPR compliance, saying the effort aims to “foster cooperation and exchange information on possible enforcement actions conducted by data protection authorities regarding the ChatGPT service and provide the platform for its joint analysis on the EU level”. 

Discussing their own expectations for the complaint, Olejnik’s lawyer, Maciej Gawronski, suggests the length of time it could take the Polish regulator to investigate could be “anything from six months to two years”.

“Provided UODO confirms violation of the GDPR we would expect UODO to primarily order OpenAI to exercise Mr Olejnik’s rights,” he told us. “In addition, as we argue that some of OpenAI’s violations may be systemic, we hope the DPA will investigate the processing thoroughly and, if justified, order OpenAI to act in compliance with the GDPR so that data processing operations within ChatGPT are lawful in a more universal perspective.”

Gawronski also takes the view that OpenAI has failed to apply Article 36 of the GDPR — since it did not engage in a process of prior consultation with the UODO or any other European DPA before launching ChatGPT — adding: “We would expect UODO to force OpenAI into engaging into a similar process now.”

In another step, the complaint urges the Polish regulator to require OpenAI to submit a data protection impact assessment (DPIA) with details of its processing of personal data for purposes related to ChatGPT — describing this document, which is a standard feature of data protection compliance in Europe, as an “important element” for assessing whether the tool is compliant with the GDPR. 

For his part, Olejnik says his hope in bringing the complaint against OpenAI and ChatGPT is that he will be able to properly exercise all the GDPR rights he has found himself unable to so far.

“During this journey I felt kind of like Josef K, in kafka’s The Trial,” he told us. “Fortunately, in Europe there’s a system in place to avoid such a feeling. I trust that the GDPR process does work!”

This report was updated with comment from the Polish data protection authority 

Italy gives OpenAI initial to-do list for lifting ChatGPT suspension order

ChatGPT resumes service in Italy after adding privacy disclosures and controls

More TechCrunch

Welcome back to TechCrunch’s Week in Review. This week had two major events from OpenAI and Google. OpenAI’s spring update event saw the reveal of its new model, GPT-4o, which…

OpenAI and Google lay out their competing AI visions

Expedia says Rathi Murthy and Sreenivas Rachamadugu, respectively its CTO and senior vice president of core services product & engineering, are no longer employed at the travel booking company. In…

Expedia says two execs dismissed after ‘violation of company policy’

When Jeffrey Wang posted to X asking if anyone wanted to go in on an order of fancy-but-affordable office nap pods, he didn’t expect the post to go viral.

With AI startups booming, nap pods and Silicon Valley hustle culture are back

OpenAI’s Superalignment team, responsible for developing ways to govern and steer “superintelligent” AI systems, was promised 20% of the company’s compute resources, according to a person from that team. But…

OpenAI created a team to control ‘superintelligent’ AI — then let it wither, source says

A new crop of early-stage startups — along with some recent VC investments — illustrates a niche emerging in the autonomous vehicle technology sector. Unlike the companies bringing robotaxis to…

VCs and the military are fueling self-driving startups that don’t need roads

When the founders of Sagetap, Sahil Khanna and Kevin Hughes, started working at early-stage enterprise software startups, they were surprised to find that the companies they worked at were trying…

Deal Dive: Sagetap looks to bring enterprise software sales into the 21st century

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI moves away from safety

After Apple loosened its App Store guidelines to permit game emulators, the retro game emulator Delta — an app 10 years in the making — hit the top of the…

Adobe comes after indie game emulator Delta for copying its logo

Meta is once again taking on its competitors by developing a feature that borrows concepts from others — in this case, BeReal and Snapchat. The company is developing a feature…

Meta’s latest experiment borrows from BeReal’s and Snapchat’s core ideas

Welcome to Startups Weekly! We’ve been drowning in AI news this week, with Google’s I/O setting the pace. And Elon Musk rages against the machine.

Startups Weekly: It’s the dawning of the age of AI — plus,  Musk is raging against the machine

IndieBio’s Bay Area incubator is about to debut its 15th cohort of biotech startups. We took special note of a few, which were making some major, bordering on ludicrous, claims…

IndieBio’s SF incubator lineup is making some wild biotech promises

YouTube TV has announced that its multiview feature for watching four streams at once is now available on Android phones and tablets. The Android launch comes two months after YouTube…

YouTube TV’s ‘multiview’ feature is now available on Android phones and tablets

Featured Article

Two Santa Cruz students uncover security bug that could let millions do their laundry for free

CSC ServiceWorks provides laundry machines to thousands of residential homes and universities, but the company ignored requests to fix a security bug.

2 days ago
Two Santa Cruz students uncover security bug that could let millions do their laundry for free

TechCrunch Disrupt 2024 is just around the corner, and the buzz is palpable. But what if we told you there’s a chance for you to not just attend, but also…

Harness the TechCrunch Effect: Host a Side Event at Disrupt 2024

Decks are all about telling a compelling story and Goodcarbon does a good job on that front. But there’s important information missing too.

Pitch Deck Teardown: Goodcarbon’s $5.5M seed deck

Slack is making it difficult for its customers if they want the company to stop using its data for model training.

Slack under attack over sneaky AI training policy

A Texas-based company that provides health insurance and benefit plans disclosed a data breach affecting almost 2.5 million people, some of whom had their Social Security number stolen. WebTPA said…

Healthcare company WebTPA discloses breach affecting 2.5 million people

Featured Article

Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Microsoft won’t be facing antitrust scrutiny in the U.K. over its recent investment into French AI startup Mistral AI.

2 days ago
Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Ember has partnered with HSBC in the U.K. so that the bank’s business customers can access Ember’s services from their online accounts.

Embedded finance is still trendy as accounting automation startup Ember partners with HSBC UK

Kudos uses AI to figure out consumer spending habits so it can then provide more personalized financial advice, like maximizing rewards and utilizing credit effectively.

Kudos lands $10M for an AI smart wallet that picks the best credit card for purchases

The EU’s warning comes after Microsoft failed to respond to a legally binding request for information that focused on its generative AI tools.

EU warns Microsoft it could be fined billions over missing GenAI risk info

The prospects for troubled banking-as-a-service startup Synapse have gone from bad to worse this week after a United States Trustee filed an emergency motion on Wednesday.  The trustee is asking…

A US Trustee wants troubled fintech Synapse to be liquidated via Chapter 7 bankruptcy, cites ‘gross mismanagement’

U.K.-based Seraphim Space is spinning up its 13th accelerator program, with nine participating companies working on a range of tech from propulsion to in-space manufacturing and space situational awareness. The…

Seraphim’s latest space accelerator welcomes nine companies

OpenAI has reached a deal with Reddit to use the social news site’s data for training AI models. In a blog post on OpenAI’s press relations site, the company said…

OpenAI inks deal to train AI on Reddit data

X users will now be able to discover posts from new Communities that are trending directly from an Explore tab within the section.

X pushes more users to Communities

For Mark Zuckerberg’s 40th birthday, his wife got him a photoshoot. Zuckerberg gives the camera a sly smile as he sits amid a carefully crafted re-creation of his childhood bedroom.…

Mark Zuckerberg’s makeover: Midlife crisis or carefully crafted rebrand?

Strava announced a slew of features, including AI to weed out leaderboard cheats, a new ‘family’ subscription plan, dark mode and more.

Strava taps AI to weed out leaderboard cheats, unveils ‘family’ plan, dark mode and more

We all fall down sometimes. Astronauts are no exception. You need to be in peak physical condition for space travel, but bulky space suits and lower gravity levels can be…

Astronauts fall over. Robotic limbs can help them back up.

Microsoft will launch its custom Cobalt 100 chips to customers as a public preview at its Build conference next week, TechCrunch has learned. In an analyst briefing ahead of Build,…

Microsoft’s custom Cobalt chips will come to Azure next week

What a wild week for transportation news! It was a smorgasbord of news that seemed to touch every sector and theme in transportation.

Tesla keeps cutting jobs and the feds probe Waymo