When you think of foundation models and what they can accomplish, chances are you are thinking of text and image creation. Numbers Station, however, is taking these models in a very different direction. The startup, which is announcing a $17.5 million Series A round today, is using these models to build what the company calls an “intelligent data stack automation platform.” The funding round was led by Madrona, with participation from Norwest Venture Partners, Factory and a number of angel investors, including Cloudera co-founder Jeff Hammerbacher.
Founded by Stanford PhDs Chris Aberger (CEO), Ines Chami (Chief Scientist) and Sen Wu, together with Stanford associate professor Chris Ré, Numbers Station aims to bring the power of GPT-style foundation models to enterprise use cases, starting with data transformation and record matching. To do this, the company uses these foundation models to, for example, translate natural language queries into SQL commands.
“We all got PhDs on a mix of AI and data systems,” Aberger told me when I asked him about the origins of the company. “We saw at the time that most of the AI talent was really focusing on — for lack of a better word — the sexier AI applications. All the things you see in the news: marketing content generation, image generation, whatever it might be. […] But most of the AI talent was not focusing on these dirty data plumbing, data munging, data wrangling, data preparation operations. It’s not as sexy as generating an image to say that we are going to reformat the dates in your database, but that’s still a huge enterprise problem and enterprise need.”
Aberger also noted that when the team looked at how different data-centric teams inside many enterprises work together, a lot of different capabilities were locked up in different teams, all while these teams spend a lot of their time on the routine data transformation work that’s needed to enable more complex use cases. He believes that Numbers Station — and the foundation models that power it — will be able to democratize access to these capabilities.
“At a high level, our mission is to accelerate those teams and accelerate the data analyst teams, so that they can spend more time giving insights and less time on these mundane data operations,” Aberger explained.
In practice, this means the service currently offers three distinct capabilities. The first is SQL transformation, which lets users specify what they need in natural language, with Numbers Station then generating the SQL query. The other is what the company calls “AI transformation,” that is, the ability to prototype intelligent data transformations powered by AI. And lastly, Numbers Station also offers a record matching feature which allows users to, for example, combine the records in their CRM and sales systems into a single database.
As Numbers Station co-founder and chief scientist Ines Chami told me, the team isn’t simply taking a foundation model and applying that to all of these use cases. “It’s really important to personalize and adapt the model specifically to the organization,” she said. “The idea is to generate answers that are specific to the organization, so we use fine-tuning techniques and also feedback.” She noted that the company starts users on a general purpose pre-trained model but then, as users provide the model with feedback, it generates smaller, organization-specific models for these users. “It’s very much human-in-the-loop in order to adapt [the model] to the organizational knowledge. So on all fronts, we’ve noticed it’s very important to go beyond the general-purpose model. That’s great to start, but very quickly, you need to fine-tune and specialize the models,” she explained, and also noted that the company always keeps every customer’s data siloed.
Aberger also stressed that in his view, foundation models from the likes of OpenAI will become commoditized. “What really matters is where are you apply AI expertise on top of these models to make them perform really well for specific organizations and a specific organization’s tasks,” he said.
And make no mistake, Numbers Station is looking at these first three features as its entry point into the enterprise data stack. The larger vision here is to build a data intelligence platform that doesn’t just help businesses transform their data but also analyze it. But to do that, it first needs to help the businesses clean up their records.
“We believe foundation models, specifically how the Numbers Station team is applying them, provide a truly ‘zero to one’ opportunity to address the massive challenge of dealing with messy data,” said Madrona Managing Director Tim Porter, who will join the Numbers Station board. “Even more exciting, data prep is just the first step in the team’s ambitious vision. For instance, immediately connecting these intelligent transforms into automated workflows is another key advantage of the Numbers Station approach. We are thrilled to back this world-class customer-driven team as they build out the platform we believe all enterprises will see as a crucial element in their modern data stack.”