Diffbot is a startup that’s trying to make sense of the mass of information available on the web via robotic vision and computer learning, and it’s doing so one chunk at a time. Previously, the company released a comprehensive API for identifying and deriving key info from article pages on the web, and now it’s launching a Product Page API to do the same for ecommerce and shopping sites.
The new API will allow Diffbot to crawl the web and parse information such as price, discounts, shipping, images, descriptions and SKUs, and then translate that into an immediately usable database format for devs to mine and repurpose however they wish. This is incredible useful for comparison shopping sites, for instance, but Diffbot CEO and founder Mike Tung says they’ve also had a lot of interest in the product from collecting, bookmarking and listing sites similar to Pinterest.
“Product discovery type services where the users themselves are submitting links to products [is a use case],” he said. “We did some data analysis last year and 8 percent of the links that people are sharing on Twitter are products, and there are a lot of sites where the entire concept of the site is just to share links to products with other users on the site. With the product API now it’s not just a link, with a picture; you know the price and all the product details.”
Like the Articles API before it, Diffbot will offer the Product API on a usage-based software-as-a-service model, and this should allow everyone from small companies just starting out to big brands to take advantage. Diffbot’s current clients include AOL (disclosure: they own TC), as well as Betaworks, CBS Interactive, StumbleUpon and more. The Product API opens up a whole new category of potential customer for the Palo Alto-based startup, which has raised just over $2 million to date and is not currently looking around for anymore, according to Tung, as they’re already happy with their own current revenue being generated by products.
Diffbot plans to release a whole slew of APIs to target different page categories, and Tung says that the engine behind it can easily learn new categories without much in the way of additional engineering. Preparing a new category for general release involves helping the Diffbot robotic brain to essentially learn to spot pertinent information on its own, and that means talking to stakeholders to identify exactly what kind of information they should be looking for. Sometimes that’s obvious, as with price and description for products, but other, like SKU and manufacturer ID are less so.
Adding to Diffbot’s existing library of Home page, Article page and Image page identification APIs, the Product page release is a key new addition to its platform, and one that should see high demand. Diffbot’s progress is impressive, and this is definitely a startup to watch as it continues to lay pipes working in the background to identify and make sense of the Internet’s mass of available data.