Three-year-old Sentisis is a natural language processing (NLP) startup whose founder spotted an opportunity to do technology-powered sentiment analysis of Spanish language conversations on social media, given the prevailing focus on data-mining online English had left Spanish speech relatively overlooked.
It’s not a small opportunity, either: some 500 million+ people speak Spanish globally. So that’s a lot of linguistic insights to be unlocked by NLP.
“I studied a Master’s in Chicago,” says CEO and co-founder Jorge Peñalva explaining the company origins. “I was researching about sentiment analysis and I realized there was a great opportunity in the Spanish language because most of the research was based in English.”
The biggest challenge for doing NLP in Spanish was managing the various “versions” — as Peñalva puts it — whether it’s Mexican Spanish, Columbian, other South American flavors, Hispanic U.S., or indeed European Spanish.
“They all have different expressions but they share the same syntactic structure. Semantically they are quite similar except for their local expressions. Which usually express sentiment or irony, etc,” he says. “So what we did was we developed a methodology to adapt the technology to each industry and each country.”
After launching its SaaS platform in April 2013 with its first customer, Sentisis now has more than 50 users, ranging from big brands to advertising agencies, to politicians, to TV (it’s working with the Spanish version of Big Brother, for instance).
The platform uses different linguistic technology on the back-end, depending on the customer’s flavor of Spanish and their industry. The platform is also tailored to deliver the specific type of sales, marketing or customer service insights they’re after — whether it’s purchase intent, lead detection, sentiment analysis, and so on — serving these in a structured, dashboard format, and via bespoke reports. Linguistically speaking Sentisis covers around 40 industries at this point, according to Peñalva.
“We extract information that is relevant to a certain industry — for example for a consumer industry it’s very interesting to detect purchase intent… whereas some other industries that rely heavily on customer service it’s important to understand which comments are more urgent… This is the adaptations that allow us to extract actionable data,” he adds.
“We have linguists that work in a very agile way to extract rules that are relevant for industries or countries. So for our computational linguists the problem is usually the same: it’s knowing what is the importance of context of the texts.”
The startup has just closed a $1.3 million Series A funding round, led by international investment firm Axon Partners Group and the Fundación José Manuel Entrecanales (a firm that works with Spanish-based fund, FIDES). Existing investors, including San Francisco based seed fund Startcaps Ventures and 500 Startups, also participated in the round. Sentisis went through 500 Startups’ accelerator program last May in Mexico. It had previously raised around $500,000 in seed funding.
Why is it taking more funding now, given it’s bringing in revenue? Peñalva says that’s down to the opportunity it believes it has to scale up and dominate the Spanish-speaking market.
“We think there is a great opportunity in the Spanish-speaking market and we believe that other competitors are not so focused in this language — we want to be the leaders in this market. In all different countries. Not only in Spain but in Latin America,” he says. “And also there is a great opportunity in the U.S., focused on Hispanics, which are now about 50 million people. Brands are starting to pay attention to how to understand these customers as a different kind of customer.”
There is also, he reckons, generally more insights — and therefore more value — to be unlocked from tapping into Spanish language conversations being as they’ve been relatively under-data-mined thus far, especially in regions such as Latin American. “We believe that both with linguistics and data science we can still extract more value for our customers. We believe it’s a very small percentage of the value that there is in social media that is being used,” he adds.
U.S. competitors in the space include Salesforce’s Radian6, Sysomos and Brandwatch — although Peñalva believes there may be scope for partnerships with such larger players, given they have not been specializing in Spanish. While various other local players exist in other markets, such as Argentina’s Socialmetrix.
Update: A Salesforce spokeswoman has been in touch to note that Salesforce Social Studio (which Radian6 evolved into) has “several partners including Clarabridge and Bitext that specialize in Spanish sentiment technically” and which already integrate into its offering.