Diffbot is a geeky and incredibly interesting technology that uses bots, algorithms, computer vision and artificial intelligence to process the content on the Web the way a human being can. “The entire Internet can be broken down into 30 different page types” explains Co-founder Mike Tung, also known as “Diffbot Mike,” and “Diffbot can identify them all.” Diffbot knows the difference between a social network profile, a blog post, a site’s front page, a product page, an event page and dozens more.
Today, Diffbot is releasing its first set of APIs, now open to all developers for free. The launch has the potential to dramatically impact the types of applications developers can build, and for consumers, it means a whole host of intelligent applications are about to emerge.
The New APIs: On-Demand & Follow
With the two API’s available now, developers can build apps that automatically extract meaning from pages, apps that understand what’s trending and who’s talking about it, apps that provide RSS feeds where none were available before and apps that read just the relevant parts of webpages aloud, ignoring ads, header and footer copy.
And that’s just for starters. Future API’s will enable developers to automatically turn event pages into calendar appointments, social network profiles into vCards or automatically extract shipping prices or reviews from product pages, among other things. While Diffbot doesn’t have a set roadmap, it expects to launch these additional API’s over the new few months.
Today, the first 2 API’s available are:
What Can Diffbot Actually Do?
These same APIs are already being used by companies like speech recognition system maker Nuance, AOL (disclaimer: TechCrunch is owned by AOL), social media monitoring firm SocMetrics, and others.
AOL uses Diffbot to extract the title, author, image, text, videos, topics and other metadata for its new iPad mag, AOL Editions. Nuance uses the technology to improve its natural language processing in a product for doctors, which requires comprehension of complex medical terminology. SocMetrics sends bit.ly shortened links to Diffbot to get the full article text and topics, so it can determine which social media users are talking about which topics the most.
These are just a few big-name examples. There are smaller, but just as innovative use cases out there, too. Like Hacker News Radio, for example, which reads Hacker News and comments to you. Or FeedBeater, which makes it easy to turn any URL into an RSS feed automatically (one of Diffbot’s first creations). Or this Diffbot-generated Twitter feed, which tracks changes to the webpage for the city of São Paulo, Brazil (as it lacks RSS), and tweets the updates.
The new self-serve platform for developers is free up to 50,000 API calls per month. The cloud plan provides 100,000 calls for $500, then is $0.002/call afterwards. The Managed plan for Enterprise requires custom pricing.
Diffbot was founded by Mike Tung and Leith Abdulla, both Stanford PhD students on a leave of absence to build the company. The idea sprung from Tung’s desire to automatically track new assignments on the class website automatically, through the use of technology. Diffbot was also the first startup funded by Stanford’s incubator program, now called StartX (formerly SSE Labs).
Diffbot provides a set of APIs that enable developers to easily use web data in their own applications. Diffbot analyzes documents much like a human would, using the visual properties to determine how the parts of the page fit together. The algorithm uses statistical techniques to automatically and reliably determine the structural organization of a page, independent of layout and the language of the text. Diffbot’s technology is used by some of the world’s largest content companies. The company was the...