Should artists be paid for training data? OpenAI VP wouldn't say

Should artists whose work was used to train generative AI like ChatGPT be compensated for their contributions? Peter Deng, VP of consumer product at OpenAI — the maker of ChatGPT — was loath to give an answer when asked on SXSW’s main stage this afternoon.

“That’s a great question,” he said when SignalFire venture partner (and former TechCrunch writer) Josh Constine, who interviewed Deng in a wide-ranging fireside, asked the question. Some in the crowd of onlookers shouted “yes” in response, which Deng acknowledged. “I’m hearing from the audience that they do. I’m hearing from the audience they do.”

That Deng dodged the question isn’t surprising. OpenAI is in a delicate legal position where it concerns the ways in which it uses data to train generative AI systems like the art-creating tool DALL-E 3, which is incorporated into ChatGPT.

Systems like DALL-E 3 are trained on an enormous number of examples — artwork, illustrations, photos and so on — usually sourced from public sites and datasets around the web. OpenAI and other generative AI vendors argue that fair use, the legal doctrine that allows for the use of copyrighted works to make a secondary creation as long as it’s transformative, shields their practice of scraping public data and using it for training without compensating or even crediting artists.

OpenAI, in fact, recently argued that it would be impossible to create useful AI models absent copyrighted material. “Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents,” writes the company in a January blog post. “We view this principle as fair to creators, necessary for innovators, and critical for U.S. competitiveness.”

Creators, unsurprisingly, disagree.

A class action lawsuit brought by artists including Grzegorz Rutkowski, known for his work on Dungeons & Dragons and Magic: The Gathering, against OpenAI and several of its rivals (Midjourney and DeviantArt) is making its way through the courts. The defendants argue that tools like DALL-E 3 and Midjourney replicate artists’ styles without the artists’ explicit permission, allowing users to generate new works resembling the artists’ originals for which the artists receive no payment.

OpenAI has licensing agreements in place with some content providers, like Shutterstock, and allows webmasters to block its web crawler from scraping their site for training data. In addition, like some of its rivals, OpenAI lets artists “opt out” of and remove their work from the datasets that the company uses to train its image-generating models. (Some artists have described the opt-out tool, which requires submitting an individual copy of each image to be removed along with a description, as onerous, however.)

Deng said that he believes artists should have more agency in the creation and use of generative AI tools like DALL-E, but isn’t sure, exactly, what form that might take.

“[A]rtists need to be a part of [the] ecosystem as much as possible,” Deng said. “I believe that if we can find a way to make the flywheel of creating art faster, we’ll really help the industry out a bit more … In a sense, every artist has been inspired by artists who’ve come before them, and I wonder how much of that will be accelerated by this.”