Deasie wants to rank and filter data to make generative AI more reliable

Deasie, a startup developing tools to give companies greater control over text-generating AI models, today announced that it raised $2.9 million in a seed funding round with participation from Y Combinator, General Catalyst, RTP Global, Rebel Fund and J12 Ventures.

Deasie’s founders, Reece Griffiths, Mikko Peiponen and Leo Platzer, previously built data governance tools together at McKinsey. While at McKinsey, they say they observed “significant problems” — and opportunities — around enterprise data governance, and specific ways in which these problems could impact a company’s ability to adopt generative AI.

They’re not the only ones. A recent IDC survey of more than 900 executives at large enterprises found that 86% agree more governance is needed to ensure the “quality and integrity” of generative AI insights. Just 30% of respondents to the survey, meanwhile, said that they felt “extremely prepared or ready” to leverage generative AI today.

In an effort to make generative AI models — specifically large language models (LLM) along the lines of OpenAI’s GPT-4 — more reliable, the Deasie team built a product that connects to unstructured company data like documents, reports and emails to automatically categorize them in terms of their contents and sensitivity.

For example, Deasie might auto-tag a report “personally identifiable information” or “proprietary information” and indicate that it’s the third version of the report. Or it might tag a spec sheet “proprietary information” and highlight that the sheet has restricted access rights. Deasie customers define the tags and labels to reflect their approach to classifying and organizing data, Griffiths told TechCrunch via email, which “teaches” Deasie’s algorithms how to classify future data. 

After Deasie auto-tags documents, the platform works through the resultant library of tags to evaluate the corresponding data in terms of its overall relevance and importance. Then, based on this assessment, it makes a decision about which data to “feed” to a text-generating model.

“Enterprises have enormous volumes of unstructured data that have rarely received any attention from a governance perspective,” Griffiths said. “The probability that language models retrieve answers that don’t make sense, or are exposed to sensitive information, scales with the volume of data. Deasie is an intelligent platform that filters through thousands of documents across an enterprise and ensures that data being fed into generative AI applications is relevant, high-quality and safe to use.”

Deasie is an intriguing platform, to be sure. The idea of limiting an LLM to vetted data isn’t a bad one — particularly considering the consequences of letting LLMs loose on out-of-date and conflicting info. But I wonder how consistently Deasie’s algorithms classify data and how often the platform makes mistakes in sussing out a document’s importance.

Whatever demo Deasie’s showing, companies must answer those questions to at least a few of their satisfactions. Griffiths says Deasie — which only has three employees — has signed an agreement for its first pilot with a “multi-billion-dollar” enterprise in the U.S. and has a pipeline of over 30 enterprise customers, including five Fortune 500 companies.

“Other products have either focused on strictly the ‘data safety’ angle or the ‘data governance for structured data’ angle of LLM governance” Deasie said. “What didn’t exist was a good approach for measuring data quality and relevance for unstructured data … Nobody was directly solving the issue of matching every generative AI use case with the ‘best’ possible set of data. Deasie has developed novel approaches in this domain.”

In the next few months, Deasie plans to grow its engineering team and make “multiple hires,” with a focus on building features to differentiate from rivals like Unstructured.io, Scale AI, Collibra and Alation.