Nvidia releases a toolkit to make text-generating AI ‘safer’

For all the fanfare, text-generating AI models like OpenAI’s GPT-4 make a lot of mistakes — some of them harmful. The Verge’s James Vincent once called one such model an “emotionally manipulative liar,” which pretty much sums up the current state of things.

The companies behind these models say that they’re taking steps to fix the problems, like implementing filters and teams of human moderators to correct issues as they’re flagged. But there’s no one right solution. Even the best models today are susceptible to biases, toxicity and malicious attacks. 

In pursuit of “safer” text-generating models, Nvidia today released NeMo Guardrails, an open source toolkit aimed at making AI-powered apps more “accurate, appropriate, on topic and secure.”

Jonathan Cohen, the VP of applied research at Nvidia, says the company has been working on Guardrails’ underlying system for “many years” but just about a year ago realized it was a good fit for models along the lines of GPT-4 and ChatGPT.

“We’ve been developing toward this release of NeMo Guardrails ever since,” Cohen told TechCrunch via email. “AI model safety tools are critical to deploying models for enterprise use cases.”

Guardrails includes code, examples and documentation to “add safety” to AI apps that generate text as well as speech. Nvidia claims that the toolkit is designed to work with most generative language models, allowing developers to create rules using a few lines of code.

Specifically, Guardrails can be used to prevent — or at least attempt to prevent — models from veering off topic, responding with inaccurate information or toxic language and making connections to “unsafe” external sources. Think keeping a customer service assistant from answering questions about the weather, for instance, or a search engine chatbot from linking to disreputable academic journals.

“Ultimately, developers control what is out of bounds for their application with Guardrails,” Cohen said. “They may develop guardrails that are too broad or, conversely, too narrow for their use case.”

A universal fix for language models’ shortcomings sounds too good to be true, though — and indeed, it is. While companies like Zapier are using Guardrails to add a layer of safety to their generative models, Nvidia acknowledges that the toolkit isn’t imperfect; it won’t catch everything, in other words.

Cohen also notes that Guardrails works best with models that are “sufficiently good at instruction-following,” à la ChatGPT, and that use the popular LangChain framework for building AI-powered apps. That disqualifies some of the open source options out there.

And — effectiveness of the tech aside — it must be emphasized that Nvidia isn’t necessarily releasing Guardrails out of the goodness of its heart. It’s a part of the company’s NeMo framework, which is available through Nvidia’s enterprise AI software suite and its NeMo fully managed cloud service. Any company can implement the open source release of Guardrails, but Nvidia would surely prefer that they pay for the hosted version instead.

So while there’s probably no harm in Guardrails, keep in mind that it’s not a silver bullet — and be wary if Nvidia ever claims otherwise.