OpenAI debates when to release its AI-generated image detector

OpenAI has “discussed and debated quite extensively” when to release a tool that can determine whether an image was made with DALL-E 3, OpenAI’s generative AI art model, or not. But the startup isn’t close to making a decision anytime soon.

That’s according to Sandhini Agarwal, an OpenAI researcher who focuses on safety and policy, who spoke with TechCrunch in a phone interview this week. She said that, while the classifier tool’s accuracy is “really good” — at least by her estimation — it hasn’t met OpenAI’s threshold for quality.

“There’s this question of putting out a tool that’s somewhat unreliable, given that decisions it could make could significantly affect photos, like whether a work is viewed as painted by an artist or inauthentic and misleading,” Agarwal said.

OpenAI’s targeted accuracy for the tool appears to be extraordinarily high. Mira Murati, OpenAI’s chief technology officer, said this week at The Wall Street Journal’s Tech Live conference that the classifier is “99%” reliable at determining if an unmodified photo was generated using DALL-E 3. Perhaps the goal is 100%; Agarwal wouldn’t say.

A draft OpenAI blog post shared with TechCrunch revealed this interesting tidbit:

“[The classifier] remains over 95% accurate when [an] image has been subject to common types of modifications, such as cropping, resizing, JPEG compression, or when text or cutouts from real images are superimposed onto small portions of the generated image.”

OpenAI’s reluctance could be tied to the controversy surrounding its previous public classifier tool, which was designed to detect AI-generated text not only from OpenAI’s models, but from text-generating models released by third-party vendors. OpenAI pulled the AI-written text detector over its “low rate of accuracy,” which had been widely criticized.

Agarwal implies that OpenAI is also hung up on the philosophical question of what, exactly, constitutes an AI-generated image. Artwork generated from scratch by DALL-E 3 qualifies, obviously. But what about an image from DALL-E 3 that’s gone through several rounds of edits, has been combined with other images and then was run through a few post-processing filters? It’s less clear.

An image generated by DALL-E 3. Image Credits: OpenAI

“At that point, should that image be considered something AI-generated or not?,” Agarwal said. “Right now, we’re trying to navigate this question, and we really want to hear from artists and people who’d be significantly impacted by such [classifier] tools.”

A number of organizations — not just OpenAI — are exploring watermarking and detection techniques for generative media as AI deepfakes proliferate.

DeepMind recently proposed a spec, SynthID, to mark AI-generated images in a way that’s imperceptible to the human eye but can be spotted by a specialized detector. French startup Imatag, launched in 2020, offers a watermarking tool that it claims isn’t affected by resizing, cropping, editing or compressing images, similar to SynthID. Yet another firm, Steg.AI, employs an AI model to apply watermarks that survive resizing and other edits.

The problems is, the industry has yet to coalesce around a single watermarking or detection standard. Even if it does, there’s no guarantee that the watermarks — and detectors for that matter — won’t be defeatable.

I asked Agarwal whether OpenAI’s image classifier would ever support detecting images created with other, non-OpenAI generative tools. She wouldn’t commit to that, but did say that — depending on the reception of the image classifier tool as it exists today — it’s an avenue OpenAI would consider exploring.

“One of the reasons why right now [the classifier is] DALL-E 3-specific is because that’s, technically, a much more tractable problem,” Agarwal said. “[A general detector] isn’t something we’re doing right now… But depending on where [the classifier tool] goes, I’m not saying we’ll never do it.”