Y Combinator backed Plasticity is tackling the problem of getting software systems to better understand text, using deep learning models trained to understand what they’re reading on Wikipedia articles — and offering an API for developers to enhance their own interfaces.
Specifically they’re offering two APIs for developers to build “more robust conversational interfaces”, as they put it — with the aim of becoming a “centralized solution” for Natural Language Processing (NLP). Their APIs are due to be switched from private to public beta on Monday.
“One thing where we think this is really useful for is conversational interfaces where you want to integrate real world knowledge,” says co-founder Alex Sands. “We think it’s also really useful when you want to provide instant answers in your application — whether that’s over the entire Internet or over a custom corpus.”
One example might be search engines that are competing with Google and don’t have their own instant answer technology. “[They] could use something like this. We’re in talks with a few of them,” notes Sands.
“The other application is conversational interfaces who want a new NLP stack that will give them a lot more information than what an academic package like Stanford CoreNLP would provide them today,” he adds.
A few years back, the founders worked on a hack project that expanded the powers of Apple’s AI voice assistant Siri, by adding support for custom commands — such as playing a Spotify track or dialing up the temperature via a Nest device. This was before Apple opened up Siri to third party apps so they were routing voice commands through a proxy — and claim to have basically built “the first app store for voice commands”.
The experience taught them that “NLP in general is not robust” for handling more complex commands and queries, says other co-founder Ajay Patel.
“The other problem was a lot of the natural language processing tools out there really take a simplistic approach to understanding what a user says,” he adds. “The most simplistic way to explain it is they’re looking for keywords to figure out what a user is asking.”
Plasticity is taking a different approach vs these keyword-based NLP systems; building a system that understands the semantics of a sentence so it can perform a linguistic breakdown — “to figure out all of the relationships and entities in a sentence”.
They can then hand that information to developers so they can build “more robust conversational interfaces around it”, as Patel puts it — such as, for example, a chatbot that’s more conversational and capable, given it can serve up answers it found online.
“Today you can ask Siri fact-based questions, like who directed a movie, or who a particular song. But you can’t ask it a more useful question, like when is Stanford Spring break?” he adds. “It can’t take a sentence from the Internet and then find the direct answer in that sentence and then return that to the user.”
Instead Siri usually performs a Google search and serves those results to the user — leaving users to do the last mile legwork of extracting an actual answer.
Plasticity’s promise is to cut out that last step by returning the right answer directly to the user.
“Our core technology uses deep learning to figure out the base level of NLP tags — so that’s things like parts of speech, syntax dependency tree. So we use machine learning on the base to figure that out, and we use TensorFlow and Google’s SyntaxNet module,” says Patel. “And then on top of that we’ve built custom C++ code that basically operates a lot more accurately and a lot faster than a lot of the competitors out there.”
Of course if the Internet is your oracle then there’s limitless scope to return not truthful answers but full-on falsities, fake news and other skewed and prejudiced views — as indeed we’ve already seen Google Home do. Oops. So how does Plasticity avoid its technology falling into a similar trap and ensure accuracy in the answers its API can help provide?
“What we do right now is we run it only over Wikipedia,” says Sands on this. “Then the plan from there is to slowly expand whilst still maintaining that accuracy that you’re talking about.”
The API has been more than 1.5 years in development at this point, and they claim “much higher accuracy and much higher speed” at parsing sentences than IBM Watson, for example.
Initially, Patel says they focused on areas that existing, keyword-based NLP systems we’re handling well — such as lists — and then continued building out the complexity to handle other “linguistic edge cases”.
While they name Google as their main competitor at this point — given the company’s stated aim of organizing the world’s information, building systems that can understand text is a clear necessity for Mountain View’s mission — even so they reckon there’s room for another NLP player to offer similar services to the wider market.
“[Google has] put a lot of work into understanding text on the Internet to do their instant answer question and answering… But we really think that there’s still a space in the market for a solution for everybody else out there, who’s not Google, who’s not putting in hundreds of millions of dollars of investment into machine learning — and we really think they’ve got no ambition to become a leader in NLP. For example Apple actually outsources their question and answering on Siri to Wolfram Alpha.
“So we think there’s a significant place in the market to be the natural language processing solution and knowledge graph solution for all the other artificial intelligence products out there.”
And while their first focus is on building NLP tech that can understand semantic structure and perform granular linguistic analysis, Patel says they may also expand to other areas — such as program synthesis — to add more abilities to the API in future.
Funding wise, they’re still in the process of closing out their seed but have taken funding from multiple investors at this point — including First Round Capital’s Dorm Room Fund and General Catalyst’s Rough Draft Ventures. They’ll be looking for more investment after YC demo day, they add.