The hot new thing: AI platforms that stop AI’s mistakes before production

If you haven’t noticed, a growing amount of code that’s being generated today is “AI-assisted.” In fact, Scott Guthrie, Microsoft’s executive vice president of Cloud and AI, estimated back in March that upwards of 40% of the code that developers were uploading to the AI developer tool GitHub Copilot was both “AI-generated and unmodified.”

Now, the trend is giving rise to startups that promise to keep AI-augmented code from mucking up the works — and investors are taking notice.

Earlier this week, an Israel-based startup, Digma, announced $6 million in seed funding for a continuous feedback platform that runs locally on developers’ machines and helps them analyze their code — including generative AI-created code — to identify issues. Yesterday, a San Francisco-based testing platform called Kolena announced its own funding — $15 million — to build tools to test, benchmark and validate the performance of AI models.

Today, a months-old, four-person, Bay Area-startup called Braintrust is taking the wraps off its own fresh funding round of $3 million. According to co-founder and CEO Ankur Goyal, Braintrust is like an “operating system for engineers building AI software,” one that helps them avoid bad results from AI models. Developers building customer support chatbots, for example, might use Braintrust’s tech to ensure that their chatbot answers questions accurately rather than hallucinating false information.

Like many startups promising the ability to build more reliable AI software, Braintrust has savvy backers. Renowned angel investor Elad Gil is among its investors and helped incubate Braintrust’s initial product. (Gil flagged the round for us, calling the six-week-old outfit “a good one.”) Others of its notable investors include Adam D’Angelo of Quora, Clem Delangue of the buzzy AI outfit HuggingFace and OpenAI co-founder Greg Brockman.

Whether an impressive investing syndicate can help push Braintrust to the front of the pack is an open question. In the meantime, ensuring that AI code doesn’t break a company’s workflow is something Goyal says he was practically born to solve.

The child of doctors, Goyal grew up in Pittsburgh and thought he’d become a doctor, too. “Super nerdy” as a teenager, he says a linear algebra class in high school where he learned about Google’s PageRank algorithm would change his life. (“I get goosebumps just talking about it,” he says.) He moved on from biology, studied computer science at Carnegie Mellon University, then “out of extreme boredom” dropped out his junior year to build a relational database system at MemSQL, an early Y Combinator alum. More than five years later, Goyal co-founded his own company, Impira, and when Figma acquired the company late last year, Goyal became the head of its machine learning platform.

It was a good gig. It also gave Goyal even more insight into the growing challenge of building high-quality software products in this new age of AI everything. So late this past summer, he left to start Braintrust.

“I spent quite some time building old-school software,” he says, “and what’s really different about AI is that it’s inherently non deterministic, meaning if you write code, you can’t really guarantee that it’s going to work. You have to test it on real-world examples.” The process is called evaluation or, colloquially, “evals” and not all companies have high-enough quality data to do enough testing. It’s why Braintrust — which has yet to commercialize its product — is working with companies that do, including the workflow automation company Zapier and the spreadsheet tool company Coda, which are currently beta testing what Braintrust has built.

“Their challenge,” says Goyal, “is ‘Okay, we actually have tons of data, and we have lots of users using our product. But it’s really hard for us to boil that down into a representative set of examples that we can use to test our software.'” With BrainTrust, he says, “They can dump as much data as they want into our product, run evaluations against it, and we’ll help them curate ‘golden datasets’ that they can accumulate over time and use as a measure of whether their software is working or not.”

As a bonus, says Goyal, “We actually run inside of their cloud environments,” which enables Braintrust to operate around thorny compliance issues that could otherwise slow down its adoption within enterprises.

It’s early days, of course, and competition will only grow fiercer in the coming months and years. Deepchecks, an Israeli startup whose tagline is “continuous validation for AI,” is yet another outfit that recently raised seed funding.

Still, Goyal describes Braintrust as exactly the product he needed at Figma, and which didn’t live in the world until Braintrust recently created it. “There’s a whole universe around continuous integration that has developed over the past decade. And that’s kind of turned this into a science of shipping software. But in AI land, that methodology and workflow — until our product — just didn’t really exist.”

Pictured above, from left to right: Coleen Baik (founding designer), Ankur Goyal (CEO), Manu Goyal (founding engineer) and David Song (product manager; part of Elad Gil’s team).