Aylien launches news analysis API powered by its deep learning tech

Text analysis startup Aylien, which uses deep learning and NLP algorithms to parse text and extract intel from documents for its customers, has launched a new tool specifically focused on analyzing written news content.

“The idea for the News API is to give access to the news content that is out there enriched and in real-time to developers and data scientists,” says co-founder Parsa Ghaffari. “It’s a very data and analytics centric approach to news.”

The Dublin-based startup says it’s utilizing core text analysis tech powering its existing text API product, which launched back in February 2014 — but this time it’s focusing exclusively on news content and also doing a little more of the analytical heavy lifting for its customers.

“We decided to simplify the use case a little bit by collecting and analyzing the news documents on our end, rather than giving them the tools to do that themselves. So this was born out of that,” says Ghaffari.

He adds that the text analysis API was already being used by news and media companies “to makes sense of news articles at scale” — so the team has now stepped in with a tailored product to better serve that demand.

Aylien News API

Ghaffari says they’re targeting the News API at developers, data scientists and “solution builders” in verticals such as publishing, PR, news aggregation, newsreader apps, hedge funds, media monitoring, and voice of the customer analysis solutions. So there will evidently be some overlap/cannibalization of existing Aylien users.

Its SaaS Text API product has nearly 20,000 subscribers at this point, with Ghaffari flagging up the likes of Sony, The World Economic Forum and Complex Media as “notable customers”.

While Ghaffari mentioned a plan to launch a news API all the way back in 2014, when TechCrunch last spoke to him, he says the idea then was to build a bare bones news ticker. Whereas the News API is a fully featured product in its own right — letting users perform granular search queries — such as, for example, asking for news stories written about Donald Trump that have a negative sentiment and were published by news outlets based in Wisconsin.

The product also serves up automated summaries of retrieved news article; points to related stories; profiles social media performance; charts the volume of stories on a particular topic over time; shows sentiment breakdown; and details article length.

Aylien

Users can search for news by byline to track particular journalists’ output — a useful feature for PRs wanting to intelligently target pitches. (Rather than, *hint-hint*, repeat copy-pasting ‘I read your story about X and thought you’d be interested in writing about Y’… )

I ask Ghaffari whether he used the news API tool to determine which TC journalist to send his pitch to — and he confirms he did (as you’d hope). So, in this one example at least, the tech’s targeting relevance was fair. (It did also suggest he approach my colleague Fred, who wrote the prior story on Aylien.)

The tool draws content from a (human) compiled list of “thousands” of news sources — so, as with much AI tech, there’s still a key role for the human brain when it comes to filtering/sanity checking source material, although Ghaffari reckons this too could be automated in time.

“At the moment it’s a manually curated list of sources that we monitor,” he says, adding: “We are looking at ways to automate source discovery… But for the initial launch we wanted to have really high quality content, we didn’t want to have any noise in there. There are challenges if you increase the number of sources, you’re going to get a lot of duplicate content, a lot of low quality content… We shouldn’t do that unless we are confident we can provide measures for our users to filter that content.”

Aylien’s analysis engine currently supports six languages — including English, Spanish, German and French — which it’s hoping to ramp up to 15 by July.

Discussing how the core tech works, Ghaffari tells TechCrunch: “We utilize Deep Learning and NLP to understand news articles better, by extracting things such as mentions of entities (people, brands, organizations, products, etc), author’s sentiment, high level category and topical structure of each article, and so on, and we use this information in aggregate to train predictive models that can predict things such as best targets for a press release or most popular topics within a niche, which are tremendously valuable to a publisher, journalist or a PR person.”

“Compared to classical Machine Learning, Deep Learning significantly reduces the need for manually labelled data, and makes it easier to ‘hop’ from one language to another, or even from text to image and vice versa, without losing a lot of information,” he adds.

“Our proprietary NLP engine learns how to perform multilingual language processing, which then it applies to news content, just by looking at large volumes of text, and that makes it much more scalable, and capable of learning new languages (which is crucial in our today’s fragmented, globalized world) than what is available out there.”

In terms of competing products, he name checks the likes of Cision’s media monitoring suite, plus products from Kantar Media and Moreover Technologies, but argues that Aylien’s text analysis tech is “much more advanced”, given its been honing its algorithmic smarts for more than three years.

“There are similar offerings by Diffbot, by Alchemy API [and IBM Watson], but we have a lot more of a heavy focus on the fact that this is news content, not just any kind of document,” he adds. “Those guys look at news articles as just any other webpage. So that’s one key differentiator.”

Pricing for Aylien’s News API SaaS starts at $49 per month — for which the customer gets 30,000 articles. After that, if they want to analyze more data, there’s a sliding scale price per article.

The startup, which was founded back in November 2012, has raised $1.3 million to date, most recently taking in a €580,000 round from SOSV and Enterprise Ireland just last month.