DefinedCrowd is teaching machines to better understand the complexities of language

What DefinedCrowd offers isn’t particularly easily to distill into a quick elevator pitch. Taking the stage today as part of the Disrupt New York Battlefield, the Washington state-based company deals in complex concepts of machine learning, providing rich data sets for speech and natural language processing systems.

When breaking down its offerings, co-founder and CEO Daniela Braga often veers into abstract philosophical concepts to explain the hurdles it’s working to overcome in language processing systems. Among its offerings are the ability to help machines understand the intent of speakers, which often relies on subtle vocal distinctions.

But companies searching for solutions to those problems apparently didn’t need all that much convincing. Since making its debut last year, the company has already snagged a number of key high-profile clients. It’s not ready to announce those publicly, but key big name investors like Amazon and Sony (who both contributed to its $1.1 million seed fund round in September) give some insight into the category and quality of interested parties.

DefinedCrowd first made waves as part of Microsoft’s Accelerator. Prior to her CEO role, Braga, who holds a PhD in Speech Technology, was tasked with managing a $14 million budget allocated by Microsoft solely for the purpose of gathering data for Cortana.

When the company spun off, it took some Microsoft talent along with it, including DefinedCrowd CTO João Freitas and CBDO Aya Zook, both of whom had senior roles at the software giant prior to jumping on board with the startup. Braga says her work at Microsoft directly inspired the company’s creation, when she noted an underserved need in the growing fields of machine learning and AI.

“In 2014, something hit me,” she explains. “I thought, ‘well there’s huge opportunity and demand.’ All of the corporations are starting to look at human-computer interaction and AI in a serious way with a lot of investment. I realized that the technology was ready, the market was ready, and there was no company bringing the scalability from a SaaS (Software as a service) perspective to mitigate this data need in an efficient way.”

The company’s Cortana-connected beginnings offer some insight into how its services are being utilized — as do its investors. After all, Amazon’s investment comes through its Alexa Fund, a pretty clear indication that it sees DefinedCrowd’s services as a valuable resource for its massively popular smart assistant.

Of course, the company’s potential extends beyond the handful of AI assistants currently duking it out over mobile devices and the connected home, but for companies like Amazon, Apple, Google and Microsoft, products like Alexa and Siri represent the tip of the spear for investments in the space.

But in spite of all of the money and employee hours being pumped into these products, even the largest companies can benefit from what DefinedCrowd is offering. The company provides large data sets tailored to the specific needs of an individual client, distilled in a format that’s easy to use on the back end.

“Most of the data science companies are actually working on the latest and greatest model imaging learning — the algorithms and so on,” Braga. “Very few are looking at the data problems from the enterprise scale perspective. That’s what we do. We are a platform designed by data scientists for data scientists.”

[gallery ids="1491591,1491592,1491593,1491558,1491554,1491555,1491557,1491559"]

Q: What do you offer that the bigger companies can’t?

A: The bigger companies have the data and the machine learning and the talent. Our technology combines the crowd-sourcing component, the humans in the loop. We have people in 53 countries across 42 languages and lots of patents on quality control for people working in AI tasks.

Q: Who owns the data?

A: The clients. We work mainly with corporations and they don’t want to have their data shared with other competitors.

Q: How do you stand out among the competition?

We have a number of requests, mostly around speech and NLP. Often they have some kind of AI on the modeling side, but they really need the data side to enhance that experience. They might have the basics, but in order to scale, they come to us because we offer scale the competition can’t.

Q: Do you focus on smaller projects or permanent needs at larger scale?

A: It’s a mixed bag. What we’ve seen early on is one-offs and data scientists coming to us directly. But people

Q: If you don’t own the data, what kind of IP are you building around, cleaning the data, especially when you start working with different kinds of data sets?

A: We own the models. We train speech recognition models, entity extraction and a couple of other data mining activities.

Q: What’s your customer acquisition strategy?

A: For the most part, we’ve been very fortunate to have many natural connections from the corporate world, but having the backing from the likes of Amazon and Sony has opened up a lot of doors for us. For the most part, it’s been very organic, but our roadmap is to dial up that marketing piece, as well.