OpenAI launches a red teaming network to make its models more robust

In its ongoing effort to make its AI systems more robust, OpenAI today launched the OpenAI Red Teaming Network, a contracted group of experts to help inform the company’s AI model risk assessment and mitigation strategies.

Red teaming is becoming an increasingly key step in the AI model development process as AI technologies, particularly generative technologies, enter the mainstream. Red teaming can catch (albeit not fix, necessarily) biases in models like OpenAI’s DALL-E 2, which has been found to amplify stereotypes around race and sex, and prompts that can cause text-generating models, including models like ChatGPT and GPT-4, to ignore safety filters.

OpenAI notes that it’s worked with outside experts to benchmark and test its models before, including people participating in its bug bounty program and researcher access program. However, the Red Teaming Network formalizes those efforts, with the goal of “deepening” and “broadening” OpenAI’s work with scientists, research institutions and civil society organizations, says the company in a blog post.

“We see this work as a complement to externally-specified governance practices, such as third-party audits,” OpenAI writes. “Members of the network will be called upon based on their expertise to help red team at various stages of the model and product development lifecycle.”

Outside of red teaming campaigns commissioned by OpenAI, OpenAI says that Red Teaming Network members will have the opportunity to engage with each other on general red teaming practices and findings. Not every member will be involved with every new OpenAI model or product, and time contributions — which could be as few as 5 to 10 years a year — will be determined with members individually, OpenAI says.

OpenAI’s calling on a broad range of domain experts to participate, including those with backgrounds in linguistics, biometrics, finance and healthcare. It isn’t requiring prior experience with AI systems or language models for eligibility. But the company warns that Red Teaming Network opportunities might be subject to non-disclosure and confidentiality agreements that could impact other research.

“What we value most is your willingness to engage and bring your perspective to how we assess the impacts of AI systems,” OpenAI writes. “We invite applications from experts from around the world and are prioritizing geographic as well as domain diversity in our selection process.”

The question is, is red teaming enough? Some argue that it isn’t.

In a recent piece, Wired contributor Aviv Ovadya, an affiliate with Harvard’s Berkman Klein Center and the Centre for the Governance of AI, makes the case for “violet teaming”: identifying how a system (e.g. GPT-4) might harm an institution or public good and then supporting the development of tools using that same system to defend the institution and public good. I’m inclined to agree it’s a wise idea. But, as Ovadya points out his column, there’s few incentives to do violet teaming, let alone slow down AI releases enough to have sufficient time for it to work.

Red teaming networks like OpenAI’s seem to be the best we’ll get — at least for now.