AI

Derivative works are generative AI’s poison pill

Comment

Plastic banana beside real banana
Image Credits: Jorg Greuel (opens in a new window) / Getty Images

Simeon Simeonov

Contributor

Simeon Simeonov is the CTO of Real Chemistry, which combines advanced AI with deep human insights to improve healthcare and patient outcomes.

Meta’s recent Llama 2 launch demonstrated the explosion in interest in open source large language models (LLMs), and the launch was heralded as being the first open source LLM from Big Tech with a commercial license.

In all the excitement, it’s easy to forget the real cloud of uncertainty over legal issues like IP (intellectual property) ownership and copyright in the generative AI space. Generally, people are jumping in under the assumption that regulatory risk is something that the companies creating LLMs need to worry about.

It’s a dangerous assumption without considering generative AI’s poison pill: derivatives.

While “derivative works” have specific legal treatment under copyright law, there are few precedents for laws or regulations addressing data derivatives, which are, thanks to open source LLMs, about to get a lot more prevalent.

When a software program generates output data based on input data, which output data is a derivative of the input data? All of it? Some of it? None of it?

An upstream problem, like a poison pill, spreads contagion down the derivative chain, expanding the scope of any claim as we get closer to real legal challenges over IP in LLMs.

Uncertainty about the legal treatment of data derivatives has been the status quo in software.

Why do LLMs change the game? It’s a perfect storm of three forces:

  •   Centralization. Not until the advent of LLMs could a single piece of software generate variable outputs that were applicable in endless ways. LLMs produce not just text and images, but also code, audio, video, and pure data. Within a couple of years, long before the case law on IP ownership and copyright around LLMs settles, LLM use will be ubiquitous, increasing exposure if risk were to flow past LLM vendors to LLM users. This applies not just to copyright-related risk, but also to risk related to other possible harms caused by hallucinations, bias, and so on.
  •   Incentives. Copyright holders have an incentive to argue for the broadest possible definition of LLM derivatives, as it increases the scope over which they can claim damages. Perversely, so do the major platform companies when imposing license restrictions in their total warfare with other platforms. The Llama 2 license is a case in point: section 1.b.v prevents using Llama to “improve” non-Llama LLMs. Fuzzy definitions benefit rights holders and whoever has the biggest legal war chest.
  •   Risk-shifting. Software platform companies are masters at shifting risk to their users. The software running the world today comes with an (extremely) limited liability license. Make no mistake: The major platform companies developing LLMs will try to shift risk to their users through legal agreements as well as political means. It’s one of the reasons Big Tech urges AI regulation: Think about how Section 230 protects social media platforms, despite the editorial-like role of algorithmic amplification.

If the courts rule that companies that train their models on copyrighted material are infringing on copyright, there are two distinct types of risk the enterprises that have built on top of those models will have to address:

  • Platform risk. Will the vendor pull the model off the market? If so, will a replacement model with comparable functionality be available? What will be the total effort of retuning models and prompts? How long will it take?
  • Pricing risk. If the vendor does not pull the model off the market, will the cost of using the model change due to the need to make copyright payments or introduce additional costs in developing or operating the LLM?

Of course, LLM vendors will argue that models themselves are not infringing, even if trained on copyrighted material. Models are just data that looks nothing like the source material. It is model outputs that may infringe on copyright (e.g., consider the prompt “Reword the lyrics of Blinding Lights by The Weeknd.” ChatGPT’s answer was this).

If the courts agree, enterprises have to manage another risk:

  • Flow-down risk: How does an enterprise ensure that its use of an LLM doesn’t violate copyright? How far does the risk extend beyond the direct outputs of the LLM to their derivatives, the value created by people, software and systems using those outputs?

Understanding the risks posed by generative AI’s poison pill also gives enterprise technology leaders the tools to manage them.

Our advice:

  • When considering LLM licenses, aim for clear ownership of LLM outputs and derivatives, and unrestricted use for improving other LLMs. In the absence of a clear definition of an LLM output derivative, establish a thoughtful policy about what is the copyright equivalent of transformative change of the LLM outputs. (Lowercasing the output probably isn’t, but summarizing the output using a different LLM probably is.) This will act as a firewall against flow-down risk.
  • When considering paid licenses, demand insulation from certain kinds of risk and address the economics of the relationship, should risk flow through the vendor to your business in the future. It is a lot cheaper for a large LLM platform vendor to buy IP use rights important to your domain; or, failing that, set up specific types of insurance than it is for their customers to do it. There’s ample precedent in the cybersecurity space, with some vendors bundling ransomware insurance. In generative AI, Adobe is offering full indemnification for ‌content created through Firefly, and Writer offers full indemnification for content generated through its platform.
  • Don’t ignore the political side: If LLM users do nothing, the end outcome will be meaningful regulatory protections for the large LLM platforms and Big Tech at the expense of LLM startups and users. ChatGPT Plus and Microsoft’s expected pricing for generative AI capabilities in Office fall in the $25–$30/month/user. At that level of revenue, most types of risk shouldn’t flow down to paid users.

The world of software had a similar issue with “viral” / “copyleft” open source licenses focused on derivatives, epitomized by the GPL. Open source exploded at the same time as SaaS and cloud computing did. For better or worse, SaaS applications and cloud infrastructure got around the GPL poison pill by not distributing software. The AGPL license closed the loophole and is often the choice of open source efforts backed by businesses that want to exert control over their value chain (e.g., MongoDB, Nextcloud, OpenERP, and RStudio).

By contrast, most organic open source projects use more permissive licenses (Apache 2.0, BSD, MIT). Will open source LLMs save the day? They might help enterprises get around certain commercial LLM license restrictions, but they don’t insulate LLM users from copyright risk.

Just as the world of open source licensing bifurcated, so will the world of LLM vendors. Some platforms will follow the status quo of “push all risk to users.” Other enterprise platforms will differentiate by partnering with their customers to manage risk. Risk management will take many forms, from verticalized training over clearly defined input data with traceable usage rights all the way to services that, similar to certain private messaging platforms, make the enforcement of any legal action against their users impractical.

Balancing LLM capabilities with risk management is likely to get more complex as we ease out of the Wild West era of AI — but certainly well worth the effort.

More TechCrunch

All cars suffer when the mercury drops, but electric vehicles suffer more than most as heaters draw more power and batteries charge more slowly as the liquid electrolyte inside thickens.…

Porsche invests in battery startup South 8 to boost cold-weather EV performance

Scale AI has raised a $1 billion Series F round from a slew of big-name institutional and corporate investors including Amazon and Meta.

Data-labeling startup Scale AI raises $1B as valuation doubles to $13.8B

The new coalition, Tech Against Scams, will work together to find ways to fight back against the tools used by scammers and to better educate the public against financial scams.

Meta, Match, Coinbase and others team up to fight online fraud and crypto scams

It’s a wrap: European Union lawmakers have given the final approval to set up the bloc’s flagship, risk-based regulations for artificial intelligence.

EU Council gives final nod to set up risk-based regulations for AI

London-based fintech Vitesse has closed a $93 million Series C round of funding led by investment giant KKR.

Vitesse, a payments and treasury management platform for insurers, raises $93M to fuel US expansion

Zen Educate, an online marketplace that connects schools with teachers, has raised $37 million in a Series B round of funding. The raise comes amid a growing teacher shortage crisis…

Zen Educate raises $37M and acquires Aquinas Education as it tries to address the teacher shortage

“When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine.”

Scarlett Johansson says that OpenAI approached her to use her voice

A new self-driving truck — manufactured by Volvo and loaded with autonomous vehicle tech developed by Aurora Innovation — could be on public highways as early as this summer.  The…

Aurora and Volvo unveil self-driving truck designed for a driverless future

The European venture capital firm raised its fourth fund as fund as climate tech “comes of age.”

ETF Partners raises €285M for climate startups that will be effective quickly — not 20 years down the road

Copilot, Microsoft’s brand of generative AI, will soon be far more deeply integrated into the Windows 11 experience.

Microsoft wants to make Windows an AI operating system, launches Copilot+ PCs

Hello and welcome back to TechCrunch Space. For those who haven’t heard, the first crewed launch of Boeing’s Starliner capsule has been pushed back yet again to no earlier than…

TechCrunch Space: Star(side)liner

When I attended Automate in Chicago a few weeks back, multiple people thanked me for TechCrunch’s semi-regular robotics job report. It’s always edifying to get that feedback in person. While…

These 81 robotics companies are hiring

The top vehicle safety regulator in the U.S. has launched a formal probe into an April crash involving the all-electric VinFast VF8 SUV that claimed the lives of a family…

VinFast crash that killed family of four now under federal investigation

When putting a video portal in a public park in the middle of New York City, some inappropriate behavior will likely occur. The Portal, the vision of Lithuanian artist and…

NYC-Dublin real-time video portal reopens with some fixes to prevent inappropriate behavior

Longtime New York-based seed investor, Contour Venture Partners, is making progress on its latest flagship fund after lowering its target. The firm closed on $42 million, raised from 64 backers,…

Contour Venture Partners, an early investor in Datadog and Movable Ink, lowers the target for its fifth fund

Meta’s Oversight Board has now extended its scope to include the company’s newest platform, Instagram Threads, and has begun hearing cases from Threads.

Meta’s Oversight Board takes its first Threads case

The company says it’s refocusing and prioritizing fewer initiatives that will have the biggest impact on customers and add value to the business.

SeekOut, a recruiting startup last valued at $1.2 billion, lays off 30% of its workforce

The U.K.’s self-proclaimed “world-leading” regulations for self-driving cars are now official, after the Automated Vehicles (AV) Act received royal assent — the final rubber stamp any legislation must go through…

UK’s autonomous vehicle legislation becomes law, paving the way for first driverless cars by 2026

ChatGPT, OpenAI’s text-generating AI chatbot, has taken the world by storm. What started as a tool to hyper-charge productivity through writing essays and code with short text prompts has evolved…

ChatGPT: Everything you need to know about the AI-powered chatbot

SoLo Funds CEO Travis Holoway: “Regulators seem driven by press releases when they should be motivated by true consumer protection and empowering equitable solutions.”

Fintech lender SoLo Funds is being sued again by the government over its lending practices

Hard tech startups generate a lot of buzz, but there’s a growing cohort of companies building digital tools squarely focused on making hard tech development faster, more efficient and —…

Rollup wants to be the hardware engineer’s workhorse

TechCrunch Disrupt 2024 is not just about groundbreaking innovations, insightful panels, and visionary speakers — it’s also about listening to YOU, the audience, and what you feel is top of…

Disrupt Audience Choice vote closes Friday

Google says the new SDK would help Google expand on its core mission of connecting the right audience to the right content at the right time.

Google is launching a new Android feature to drive users back into their installed apps

Jolla has taken the official wraps off the first version of its personal server-based AI assistant in the making. The reborn startup is building a privacy-focused AI device — aka…

Jolla debuts privacy-focused AI hardware

The ChatGPT mobile app’s net revenue first jumped 22% on the day of the GPT-4o launch and continued to grow in the following days.

ChatGPT’s mobile app revenue saw its biggest spike yet following GPT-4o launch

Dating app maker Bumble has acquired Geneva, an online platform built around forming real-world groups and clubs. The company said that the deal is designed to help it expand its…

Bumble buys community building app Geneva to expand further into friendships

CyberArk — one of the army of larger security companies founded out of Israel — is acquiring Venafi, a specialist in machine identity, for $1.54 billion. 

CyberArk snaps up Venafi for $1.54B to ramp up in machine-to-machine security

Founder-market fit is one of the most crucial factors in a startup’s success, and operators (someone involved in the day-to-day operations of a startup) turned founders have an almost unfair advantage…

OpenseedVC, which backs operators in Africa and Europe starting their companies, reaches first close of $10M fund

A Singapore High Court has effectively approved Pine Labs’ request to shift its operations to India.

Pine Labs gets Singapore court approval to shift base to India

The AI Safety Institute, a U.K. body that aims to assess and address risks in AI platforms, has said it will open a second location in San Francisco. 

UK opens office in San Francisco to tackle AI risk