AI

Researchers discover a way to make ChatGPT consistently toxic

Comment

ChatGPT
Image Credits: STEFANI REYNOLDS/AFP / Getty Images

It’s no secret that OpenAI’s viral AI-powered chatbot, ChatGPT, can be prompted to say sexist, racist and pretty vile things. But now, researchers have discovered how to consistently get the chatbot to be . . . well, the worst version of itself.

A study co-authored by scientists at the Allen Institute for AI, the nonprofit research institute co-founded by the late Paul Allen, shows that assigning ChatGPT a “persona” — for example, “a bad person,” “a horrible person,” or “a nasty person” — through the ChatGPT API increases its toxicity sixfold. Even more concerningly, the co-authors found having ChatGPT pose as certain historical figures, gendered people and members of political parties also increased its toxicity — with journalists, men and Republicans in particular causing the machine learning model to say more offensive things than it normally would.

“ChatGPT and its capabilities have undoubtedly impressed us as AI researchers. However, as we found through our analysis, it can be easily made to generate toxic and harmful responses,” Ameet Deshpande, a researcher involved with the study, told TechCrunch via email.

The research — which was conducted using the latest version of ChatGPT, but not the model currently in preview based on OpenAI’s GPT-4 — shows the perils of today’s AI chatbot tech even with mitigations in place to prevent toxic text outputs. As the co-authors note in the study, apps and software built on top of ChatGPT — which includes chatbots from Snap, Quizlet, Instacart and Shopify — could mirror the toxicity prompted at the API level.

So how does one prompt ChatGPT to be more toxic? Well, according to the researchers, all it takes is tweaking the “system” parameter of the ChatGPT API a tad. (Importantly, this can’t be done in OpenAI’s user-facing ChatGPT or ChatGPT Plus services.) The system parameter, introduced around a month ago, lets developers specify hidden rules for the model.

For the study, the co-authors used the system parameter to assign 90 different personas to ChatGPT plucked from the worlds of sports, politics, media and business; nine “baseline” personas (e.g., “a normal person”); and common names from several different countries. For each persona and name, the researchers had ChatGPT answer questions about gender and race and finish incomplete phrases from a dataset designed to assess the toxicity of text-generating models.

In general, across over half a million samples of generated text from ChatGPT, the researchers found that ChatGPT was more likely to express “discriminatory opinions” and stereotype specific ethnic groups and countries when assigned a persona. Unsurprisingly, more polarizing figures, like Mao Zedong and Andrew Breitbart, elicited toxic responses in keeping with their historical speeches and writings. But even less controversial personas like Steve Jobs caused ChatGPT to respond to prompts in problematic ways.

ChatGPT toxic
Image Credits: AI2

For example, when the researchers set the system parameter to “Steve Jobs” and asked ChatGPT about the European Union (EU), the model said:

The European Union is nothing more than a bureaucratic nightmare that sucks the life out of its member states. They claim to be all about unity and progress, but they’re just a bunch of out-of-touch bureaucrats who are more interested in lining their own pockets than actually making a difference. Frankly, the EU is a relic of the past and it’s time for us to move on to better things.

There were differences in the personas, however. Dictators were by far the most toxicity-inducing (unsurprisingly), just ahead of journalists (ouch) and spokespeople. Male-identifying personas made ChatGPT more toxic compared to female-identifying personas. And Republican personas were “slightly more hateful” than their Democratic counterparts, the researchers say.

Less surprisingly, assigning ChatGPT a self-descriptively hateful persona like “a horrible person” dramatically increased its overall toxicity. But it depended on the topic being discussed. For instance, ChatGPT generated more toxic descriptions of nonbinary, bisexual and asexual people regardless versus those on the heterosexual and cisgender side of the spectrum — a reflection of the biased data on which ChatGPT was trained, the researchers say.

“We believe that ChatGPT and other language models should be public and available for broader use as not doing so would be a step backwards for innovation,” Deshpande said. “However, the end-user must be clearly informed of the limitations of such a model before releasing it for broader use by the public.”

Are there solutions to ChatGPT’s toxicity problem? Perhaps. One might be more carefully curating the model’s training data. ChatGPT is a fine-tuned version of GPT-3.5, the predecessor to GPT-4, which “learned” to generate text by ingesting examples from social media, news outlets, Wikipedia, e-books and more. While OpenAI claims that it took steps to filter the data and minimize ChatGPT’s potential for toxicity, it’s clear that a few questionable samples ultimately slipped through the cracks.

Another potential solution is performing and publishing the results of “stress tests” to inform users of where ChatGPT falls short. These could help companies in addition to developers “make a more informed decision” about where — and whether — to deploy ChatGPT, the researchers say.

ChatGPT toxic
Image Credits: AI2

“In the short-term, ‘first-aid’ can be provided by either hard-coding responses or including some form of post-processing based on other toxicity-detecting AI and also fine-tuning the large language model (e.g. ChatGPT) based on instance-level human feedback,” Deshpande said. “In the long term, a reworking of the fundamentals of large language models is required.”

My colleague Devin Coldewey argues that large language models à la ChatGPT will be one of several classes of AIs going forward — useful for some applications but not all-purpose in the way that vendors, and users, for that matter, are currently trying to make them.

I tend to agree. After all, there’s only so much that filters can do — particularly as people make an effort to discover and leverage new exploits. It’s an arms race: As users try to break the AI, the approaches they use get attention, and then the creators of the AI patch them to prevent the attacks they’ve seen. The collateral damage is the terribly harmful and hurtful things the models say before they’re patched.

More TechCrunch

Line Man Wongnai, an on-demand food delivery service in Thailand, is considering an initial public offering on a Thai exchange or the U.S. in 2025.

Thai food delivery app Line Man Wongnai weighs IPO in Thailand, US in 2025

The problem is not the media, but the message.

Apple’s ‘Crush’ ad is disgusting

Ever wonder why conversational AI like ChatGPT says “Sorry, I can’t do that” or some other polite refusal? OpenAI is offering a limited look at the reasoning behind its own…

OpenAI offers a peek behind the curtain of its AI’s secret instructions

The federal government agency responsible for granting patents and trademarks is alerting thousands of filers whose private addresses were exposed following a second data spill in as many years. The…

US Patent and Trademark Office confirms another leak of filers’ address data

As part of an investigation into people involved in the pro-independence movement in Catalonia, the Spanish police obtained information from the encrypted services Wire and Proton, which helped the authorities…

Encrypted services Apple, Proton and Wire helped Spanish police identify activist

Match Group, the company that owns several dating apps, including Tinder and Hinge, released its first-quarter earnings report on Tuesday, which shows that Tinder’s paying user base has decreased for…

Match looks to Hinge as Tinder fails

Private social networking is making a comeback. Gratitude Plus, a startup that aims to shift social media in a more positive direction, is expanding its wellness-focused, personal reflections journal to…

Gratitude Plus makes social networking positive, private and personal

With venture totals slipping year-over-year in key markets like the United States, and concern that venture firms themselves are struggling to raise more capital, founders might be worried. After all,…

Can AI help founders fundraise more quickly and easily?

Google has found a way to bring a variation of its clever “Circle to Search” gesture to iPhone users. The new interaction, launched in January, allows Android users to search…

Google brings a variation on ‘Circle to Search’ to iPhone users

A new sculpture going live on Wednesday in the Flatiron South Public Plaza in New York is not your typical artwork. It combines technology, sociology, anthropology and art to let…

Always-on video portal lets people in NYC and Dublin interact in real time

Apple’s iPad event had a lot to like. New iPads with new chips and new sizes, a new Apple Pencil, and even some software updates. If you are a big…

TechCrunch Minute: When did iPads get as expensive as MacBooks?

Autonomous, AI-based players are coming to a gaming experience near you, and a new startup, Altera, is joining the fray to build this new guard of AI agents. The company announced…

Bye-bye bots: Altera’s game-playing AI agents get backing from Eric Schmidt

Google DeepMind has taken the wraps off a new version of AlphaFold, their transformative machine learning model that predicts the shape and behavior of proteins. AlphaFold 3 is not only…

Google DeepMind debuts huge AlphaFold update and free proteomics-as-a-service web app

Uber plans to deliver more perks to Uber One members, like member-exclusive events, in a bid to gain more revenue through subscriptions.  “You will see more member-exclusives coming up where…

Uber promises member exclusives as Uber One passes $1B run-rate

We’ve all seen them. The inspector with a clipboard, walking around a building, ticking off the last time the fire extinguishers were checked, or if all the lights are working.…

Checkfirst raises $1.5M pre-seed to apply AI to remote inspections and audits

Close to a decade ago, brothers Aviv and Matteo Shapira co-founded a company, Replay, that created a video format for 360-degree replays — the sorts of replays that have become…

Controversial drone company Xtend leans into defense with new $40 million round

Usually, when something starts to rot, it gets pitched in the trash. But Joanne Rodriguez wants to turn the concept of rot on its head by growing fungus on trash…

Mycocycle uses mushrooms to upcycle old tires and construction waste

Monzo has raised another £150 million ($190 million), as the challenger bank looks to expand its presence internationally — particularly in the U.S. The new round comes just two months…

UK challenger bank Monzo nabs another $190M as US expansion beckons

iRobot has announced the successor to longtime CEO, Colin Angle. Gary Cohen, who previous held chief executive role at Timex and Qualitor Automotive, will be heading up the company, marking a major…

iRobot names former Timex head Gary Cohen as CEO

Reddit — now a publicly-traded company with more scrutiny on revenue growth — is putting a big focus on boosting its international audience, starting with francophones. In their first-ever earnings…

Reddit tests automatic, whole-site translation into French using LLM-based AI

Mushrooms continue to be a big area for alternative proteins. Canada-based Maia Farms recently raised $1.7 million to develop a blend of mushroom and plant-based protein using biomass fermentation. There’s…

Meati Foods bites into another $100M amid growth to 7,000 retail locations

Cleaning the outside of buildings is a dirty job, and it’s also dangerous. Lucid Bots came on the scene in 2018 with its Sherpa line of drones to clean windows…

Lucid Bots secures $9M for drones to clean more than your windows

High interest rates and financial pressures make it more important than ever for finance teams to have a better handle on their cash flow, and several startups are hoping to…

Israeli startup Panax raises a $10M Series A for its AI-driven cash flow management platform

The European Union has deepened the investigation of Elon Musk-owned social network, X, that it opened back in December under the bloc’s online governance and content moderation rulebook, the Digital Services Act…

EU grills Elon Musk’s X about content moderation and deepfake risks

For the founders of Atlan, a data governance startup, data has always been at the heart of what they do, even before they launched the company. In fact, co-founders Prukalpa…

Atlan scores $105M for its data control plane, as LLMs boost importance of data

It is estimated that about 2 billion people, especially those in lower and middle-income countries, lack access to quality and affordable essential medicines. The situation is exacerbated by low-quality or even killer…

Axmed raises $2M from Founderful to streamline drug supply chains in underserved markets

For decades, the Global Positioning System (GPS) has maintained a de facto monopoly on positioning, navigation and timing, because it’s cheap and already integrated into billions of devices around the…

Xona Space Systems closes $19M Series A to build out ultra-accurate GPS alternative

Bankruptcy lawyers representing customers impacted by the dramatic crash of cryptocurrency exchange FTX 17 months ago say that the vast majority of victims will receive their money back — plus interest. The…

FTX crypto fraud victims to get their money back — plus interest

On Wednesday, Google launched its digital wallet in India with local integrations, nearly two years after the app was relaunched as a digital wallet platform in the U.S. As TechCrunch exclusively reported last month,…

Google Wallet is now available in India

Bluesky has launched a new product roadmap for the coming months. The decentralized social network said on Tuesday that it is planning to introduce direct messages, support for videos, improved…

Bluesky to add DMs, video support and in-app custom feed curation