Microsoft extends generative AI copyright protections to more customers

Microsoft is expanding its policy to protect commercial customers from copyright infringement lawsuits arising from the use of generative AI — but with a caveat (or several).

Today during Ignite, Microsoft said that customers licensing Azure OpenAI Service, the company’s fully managed service that adds governance layers on top of OpenAI models, can expect to be defended — and compensated — by Microsoft for any “adverse judgements” if they’re sued for copyright infringement while using Azure OpenAI Service or the outputs it generates.

Generative AI models such as ChatGPT and DALL-E 3 are trained on millions to billions of e-books, art pieces, emails, songs, audio clips, voice recordings and more, most of which come from public websites. While some of this training data is in the public domain, some isn’t — or comes under a license that requires citation or specific forms of compensation.

The legality of vendors training on data without permission is another matter that’s being hashed out in the courts. But what might possibly land generative AI users in trouble is regurgitation, or when a generative model spits out a mirror copy of a training example.

Microsoft’s expanded policy won’t apply by default to every Azure OpenAI Service customer. To be eligible for the new protections, subscribers are on the hook for implementing “technical measures” and complying with certain documentation to mitigate the risk of generating infringing content using OpenAI’s models.

TechCrunch asked Microsoft to elaborate on these measures, but the company declined to provide specifics ahead of the announcement this morning.

It’s also unclear if the protections extend to Azure OpenAI Service products in preview, like GPT-4 Turbo with Vision, and whether Microsoft is offering indemnity against claims made over the training data used by customers to fine-tune OpenAI models. We asked for clarification.

Late this afternoon, a Microsoft spokesperson told TechCrunch via email that the policy applies to all products in paid preview and Microsoft’s — but not a customer’s — training data.

The new policy comes after Microsoft’s announcement in September that it’ll pay legal damages on behalf of customers using some — but not all — of its AI products if they’re sued for copyright infringement. As with the Azure OpenAI Service protections, customers are required to use the “guardrails and content filters” built into Microsoft’s AI offerings in order to retain coverage.

Perhaps not coincidentally, OpenAI recently said that it would begin paying the legal costs incurred by customers who face lawsuits over IP claims against work generated by OpenAI tools. Microsoft’s new Azure OpenAI Service protections would appear to be an extension of this.

Beyond indemnity policies, a partway solution to the regurgitation problem is allowing content creators to remove their data from generative model training data sets — or to give those creators some form of credit and compensation. OpenAI has said that it’ll explore this with future text-to-image models, perhaps the follow-up to DALL-E 3.

Microsoft, in contrast, hasn’t committed to opt-out or compensation schemes. But the company has developed a technology it claims can help “identify when [AI] models generate material that leverages third-party intellectual property and content.” A new feature in Microsoft’s Azure AI Content Safety tool, it’s available in preview.

We asked for background on how the IP-identifying tech works, but Microsoft demurred — simply pointing to a high-level blog post. We’ll keep our eyes peeled for more details at Ignite.