Freeplay wants to help companies test and build LLM-powered apps

Freeplay, a startup that lets companies build, experiment with and test apps powered by generative AI models, specifically text-generating models, today emerged from stealth with $3.25 million in a seed round co-led by Conviction Ventures and Matchstick Ventures.

Founded by ex-Twitter employees, including the former heads of product and engineering for Twitter’s developer platform and enterprise data business, Freeplay aims to give product dev teams tools to prototype and improve the software features powered by large language models — models akin to ChatGPT or Meta’s Llama 2.

“For seasoned product dev teams who might be new to AI, we provide a tool suite that helps them adopt best practices,” Ian Cairns, Freeplay’s co-founder and CEO, told TechCrunch in an email interview. “Freeplay gives these teams confidence to integrate LLMs into their products and ultimately deliver better customer experiences.”

Cairns and Eric Ryan co-founded Freeplay last year. They met at Gnip, a social media API aggregation company, which Twitter acquired in 2014. After the acquisition, Cairns and Ryan joined Twitter, where Cairns led the developer platform and Ryan was the senior director of engineering at Twitter’s Boulder office.

According to Cairns, he and Ryan were spurred to launch Freeplay by the challenges they saw enterprises encountering in embracing LLMs. Many existing observability tools struggled to track LLM outputs at scale, Cairns and Ryan found, while experimentation practices weren’t keeping pace with the fast-moving generative AI field.

“We were seeing how transformative LLMs would become as the technology moved from the research space to production use,” Cairns said. “In particular, many of the business-to-business software-as-a-service companies we’d worked with for years had never built with machine learning technology before, and we saw the need for new tools and new development practices to help those types of companies adopt LLMs and then improve over time.”

Freeplay’s platform combines developer integrations with a web-based dashboard. From the dashboard, teams can view how users are interacting with an AI-powered app as well as metrics like the estimated costs associated with running the app and the app’s average latency.

Beyond observability, Freeplay offers beginner-friendly features that allow users to experiment with different prompts — i.e. instructions to LLMs (“Answer questions in a way a five-year-old could understand,” “Respond with step-by-step instructions” and so on) — and swap models from different vendors (e.g. OpenAI, Anthropic) in live software. Freeplay also hosts tools to help identify and implement custom evaluations of LLMs, leveraging what Cairns calls “auto-evaluators” (automated testing tools powered by LLMs) combined with human labeling workflows.

“We help customers build a feedback loop to optimize their LLM evaluations,” Cairns said. “For example, accountants might need to review outputs for an AI accounting feature, or doctors and scientists might need to review outputs for biotech or healthcare applications … This helps [companies] build a high-quality data set that becomes an asset to further optimize the customer experience and cut costs, including by fine-tuning LLMs.”

But what sets Freeplay apart from the growing collection of tools on the market to build and benchmark AI-powered apps?

There’s generative-AI-focused observability platforms like Helicone, plus platforms for tracking and sharing prompts such as PromptLayer and LangSmith. Elsewhere, public cloud incumbents like AWS, Google Cloud and Azure have been rolling out products to address the new dev challenges arising from generative AI.

Cairns acknowledges the competition. But he claims that most vendors focus on a “narrow slice of functionality” and target individual developers, or focus on experienced machine learning and data science teams at the expense of organizations with a range of talents and backgrounds.

“Our end-to-end toolset helps teams get from prototype to production with confidence and then optimize the customer experience over time,” Cairns said. “Some tools have emerged that are very focused on developers, but don’t meet the needs of cross-functional teams who build together especially at larger companies. Others focus on one narrow use case, but don’t cover the full development lifecycle. Freeplay brings together an end-to-end workflow that still gives developers the control they need.”

Cairns claims that Freeplay has already experienced some success, with early customers paying “several hundred” to “several thousands” of dollars a month for its early service. With the cash from the seed round, Freeplay plans to grow its 10-person workforce to around 12 to 15 by the end of the year and “bring the core product to market,” Cairns says.