Derivative works are generative AI’s poison pill


Plastic banana beside real banana
Image Credits: Jorg Greuel (opens in a new window) / Getty Images

Simeon Simeonov


Simeon Simeonov is the CTO of Real Chemistry, which combines advanced AI with deep human insights to improve healthcare and patient outcomes.

Meta’s recent Llama 2 launch demonstrated the explosion in interest in open source large language models (LLMs), and the launch was heralded as being the first open source LLM from Big Tech with a commercial license.

In all the excitement, it’s easy to forget the real cloud of uncertainty over legal issues like IP (intellectual property) ownership and copyright in the generative AI space. Generally, people are jumping in under the assumption that regulatory risk is something that the companies creating LLMs need to worry about.

It’s a dangerous assumption without considering generative AI’s poison pill: derivatives.

While “derivative works” have specific legal treatment under copyright law, there are few precedents for laws or regulations addressing data derivatives, which are, thanks to open source LLMs, about to get a lot more prevalent.

When a software program generates output data based on input data, which output data is a derivative of the input data? All of it? Some of it? None of it?

An upstream problem, like a poison pill, spreads contagion down the derivative chain, expanding the scope of any claim as we get closer to real legal challenges over IP in LLMs.

Uncertainty about the legal treatment of data derivatives has been the status quo in software.

Why do LLMs change the game? It’s a perfect storm of three forces:

  •   Centralization. Not until the advent of LLMs could a single piece of software generate variable outputs that were applicable in endless ways. LLMs produce not just text and images, but also code, audio, video, and pure data. Within a couple of years, long before the case law on IP ownership and copyright around LLMs settles, LLM use will be ubiquitous, increasing exposure if risk were to flow past LLM vendors to LLM users. This applies not just to copyright-related risk, but also to risk related to other possible harms caused by hallucinations, bias, and so on.
  •   Incentives. Copyright holders have an incentive to argue for the broadest possible definition of LLM derivatives, as it increases the scope over which they can claim damages. Perversely, so do the major platform companies when imposing license restrictions in their total warfare with other platforms. The Llama 2 license is a case in point: section 1.b.v prevents using Llama to “improve” non-Llama LLMs. Fuzzy definitions benefit rights holders and whoever has the biggest legal war chest.
  •   Risk-shifting. Software platform companies are masters at shifting risk to their users. The software running the world today comes with an (extremely) limited liability license. Make no mistake: The major platform companies developing LLMs will try to shift risk to their users through legal agreements as well as political means. It’s one of the reasons Big Tech urges AI regulation: Think about how Section 230 protects social media platforms, despite the editorial-like role of algorithmic amplification.

If the courts rule that companies that train their models on copyrighted material are infringing on copyright, there are two distinct types of risk the enterprises that have built on top of those models will have to address:

  • Platform risk. Will the vendor pull the model off the market? If so, will a replacement model with comparable functionality be available? What will be the total effort of retuning models and prompts? How long will it take?
  • Pricing risk. If the vendor does not pull the model off the market, will the cost of using the model change due to the need to make copyright payments or introduce additional costs in developing or operating the LLM?

Of course, LLM vendors will argue that models themselves are not infringing, even if trained on copyrighted material. Models are just data that looks nothing like the source material. It is model outputs that may infringe on copyright (e.g., consider the prompt “Reword the lyrics of Blinding Lights by The Weeknd.” ChatGPT’s answer was this).

If the courts agree, enterprises have to manage another risk:

  • Flow-down risk: How does an enterprise ensure that its use of an LLM doesn’t violate copyright? How far does the risk extend beyond the direct outputs of the LLM to their derivatives, the value created by people, software and systems using those outputs?

Understanding the risks posed by generative AI’s poison pill also gives enterprise technology leaders the tools to manage them.

Our advice:

  • When considering LLM licenses, aim for clear ownership of LLM outputs and derivatives, and unrestricted use for improving other LLMs. In the absence of a clear definition of an LLM output derivative, establish a thoughtful policy about what is the copyright equivalent of transformative change of the LLM outputs. (Lowercasing the output probably isn’t, but summarizing the output using a different LLM probably is.) This will act as a firewall against flow-down risk.
  • When considering paid licenses, demand insulation from certain kinds of risk and address the economics of the relationship, should risk flow through the vendor to your business in the future. It is a lot cheaper for a large LLM platform vendor to buy IP use rights important to your domain; or, failing that, set up specific types of insurance than it is for their customers to do it. There’s ample precedent in the cybersecurity space, with some vendors bundling ransomware insurance. In generative AI, Adobe is offering full indemnification for ‌content created through Firefly, and Writer offers full indemnification for content generated through its platform.
  • Don’t ignore the political side: If LLM users do nothing, the end outcome will be meaningful regulatory protections for the large LLM platforms and Big Tech at the expense of LLM startups and users. ChatGPT Plus and Microsoft’s expected pricing for generative AI capabilities in Office fall in the $25–$30/month/user. At that level of revenue, most types of risk shouldn’t flow down to paid users.

The world of software had a similar issue with “viral” / “copyleft” open source licenses focused on derivatives, epitomized by the GPL. Open source exploded at the same time as SaaS and cloud computing did. For better or worse, SaaS applications and cloud infrastructure got around the GPL poison pill by not distributing software. The AGPL license closed the loophole and is often the choice of open source efforts backed by businesses that want to exert control over their value chain (e.g., MongoDB, Nextcloud, OpenERP, and RStudio).

By contrast, most organic open source projects use more permissive licenses (Apache 2.0, BSD, MIT). Will open source LLMs save the day? They might help enterprises get around certain commercial LLM license restrictions, but they don’t insulate LLM users from copyright risk.

Just as the world of open source licensing bifurcated, so will the world of LLM vendors. Some platforms will follow the status quo of “push all risk to users.” Other enterprise platforms will differentiate by partnering with their customers to manage risk. Risk management will take many forms, from verticalized training over clearly defined input data with traceable usage rights all the way to services that, similar to certain private messaging platforms, make the enforcement of any legal action against their users impractical.

Balancing LLM capabilities with risk management is likely to get more complex as we ease out of the Wild West era of AI — but certainly well worth the effort.

More TechCrunch

After Apple loosened its App Store guidelines to permit game emulators, the retro game emulator Delta — an app 10 years in the making — hit the top of the…

Adobe comes after indie game emulator Delta for copying its logo

Meta is once again taking on its competitors by developing a feature that borrows concepts from others — in this case, BeReal and Snapchat. The company is developing a feature…

Meta’s latest experiment borrows from BeReal’s and Snapchat’s core ideas

Welcome to Startups Weekly! We’ve been drowning in AI news this week, with Google’s I/O setting the pace. And Elon Musk rages against the machine.

Startups Weekly: It’s the dawning of the age of AI — plus,  Musk is raging against the machine

IndieBio’s Bay Area incubator is about to debut its 15th cohort of biotech startups. We took special note of a few, which were making some major, bordering on ludicrous, claims…

IndieBio’s SF incubator lineup is making some wild biotech promises

YouTube TV has announced that its multiview feature for watching four streams at once is now available on Android phones and tablets. The Android launch comes two months after YouTube…

YouTube TV’s ‘multiview’ feature is now available on Android phones and tablets

Featured Article

Two Santa Cruz students uncover security bug that could let millions do their laundry for free

CSC ServiceWorks provides laundry machines to thousands of residential homes and universities, but the company ignored requests to fix a security bug.

12 hours ago
Two Santa Cruz students uncover security bug that could let millions do their laundry for free

OpenAI’s Superalignment team, responsible for developing ways to govern and steer “superintelligent” AI systems, was promised 20% of the company’s compute resources, according to a person from that team. But…

OpenAI created a team to control ‘superintelligent’ AI — then let it wither, source says

TechCrunch Disrupt 2024 is just around the corner, and the buzz is palpable. But what if we told you there’s a chance for you to not just attend, but also…

Harness the TechCrunch Effect: Host a Side Event at Disrupt 2024

Decks are all about telling a compelling story and Goodcarbon does a good job on that front. But there’s important information missing too.

Pitch Deck Teardown: Goodcarbon’s $5.5M seed deck

Slack is making it difficult for its customers if they want the company to stop using its data for model training.

Slack under attack over sneaky AI training policy

A Texas-based company that provides health insurance and benefit plans disclosed a data breach affecting almost 2.5 million people, some of whom had their Social Security number stolen. WebTPA said…

Healthcare company WebTPA discloses breach affecting 2.5 million people

Featured Article

Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Microsoft won’t be facing antitrust scrutiny in the U.K. over its recent investment into French AI startup Mistral AI.

13 hours ago
Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Ember has partnered with HSBC in the U.K. so that the bank’s business customers can access Ember’s services from their online accounts.

Embedded finance is still trendy as accounting automation startup Ember partners with HSBC UK

Kudos uses AI to figure out consumer spending habits so it can then provide more personalized financial advice, like maximizing rewards and utilizing credit effectively.

Kudos lands $10M for an AI smart wallet that picks the best credit card for purchases

The EU’s warning comes after Microsoft failed to respond to a legally binding request for information that focused on its generative AI tools.

EU warns Microsoft it could be fined billions over missing GenAI risk info

The prospects for troubled banking-as-a-service startup Synapse have gone from bad to worse this week after a United States Trustee filed an emergency motion on Wednesday.  The trustee is asking…

A US Trustee wants troubled fintech Synapse to be liquidated via Chapter 7 bankruptcy, cites ‘gross mismanagement’

U.K.-based Seraphim Space is spinning up its 13th accelerator program, with nine participating companies working on a range of tech from propulsion to in-space manufacturing and space situational awareness. The…

Seraphim’s latest space accelerator welcomes nine companies

OpenAI has reached a deal with Reddit to use the social news site’s data for training AI models. In a blog post on OpenAI’s press relations site, the company said…

OpenAI inks deal to train AI on Reddit data

X users will now be able to discover posts from new Communities that are trending directly from an Explore tab within the section.

X pushes more users to Communities

For Mark Zuckerberg’s 40th birthday, his wife got him a photoshoot. Zuckerberg gives the camera a sly smile as he sits amid a carefully crafted re-creation of his childhood bedroom.…

Mark Zuckerberg’s makeover: Midlife crisis or carefully crafted rebrand?

Strava announced a slew of features, including AI to weed out leaderboard cheats, a new ‘family’ subscription plan, dark mode and more.

Strava taps AI to weed out leaderboard cheats, unveils ‘family’ plan, dark mode and more

We all fall down sometimes. Astronauts are no exception. You need to be in peak physical condition for space travel, but bulky space suits and lower gravity levels can be…

Astronauts fall over. Robotic limbs can help them back up.

Microsoft will launch its custom Cobalt 100 chips to customers as a public preview at its Build conference next week, TechCrunch has learned. In an analyst briefing ahead of Build,…

Microsoft’s custom Cobalt chips will come to Azure next week

What a wild week for transportation news! It was a smorgasbord of news that seemed to touch every sector and theme in transportation.

Tesla keeps cutting jobs and the feds probe Waymo

Sony Music Group has sent letters to more than 700 tech companies and music streaming services to warn them not to use its music to train AI without explicit permission.…

Sony Music warns tech companies over ‘unauthorized’ use of its content to train AI

Winston Chi, Butter’s founder and CEO, told TechCrunch that “most parties, including our investors and us, are making money” from the exit.

GrubMarket buys Butter to give its food distribution tech an AI boost

The investor lawsuit is related to Bolt securing a $30 million personal loan to Ryan Breslow, which was later defaulted on.

Bolt founder Ryan Breslow wants to settle an investor lawsuit by returning $37 million worth of shares

Meta, the parent company of Facebook, launched an enterprise version of the prominent social network in 2015. It always seemed like a stretch for a company built on a consumer…

With the end of Workplace, it’s fair to wonder if Meta was ever serious about the enterprise

X, formerly Twitter, turned TweetDeck into X Pro and pushed it behind a paywall. But there is a new column-based social media tool in town, and it’s from Instagram Threads.…

Meta Threads is testing pinned columns on the web, similar to the old TweetDeck

As part of 2024’s Accessibility Awareness Day, Google is showing off some updates to Android that should be useful to folks with mobility or vision impairments. Project Gameface allows gamers…

Google expands hands-free and eyes-free interfaces on Android