Diffbot, the super-geeky/awesome visual learning robot technology which aims to “see” the web the way that people do, is today announcing a new infusion of capital. The company has closed $2 million in funding from a number of technology veterans, including EarthLink founder Sky Dayton;¬†Andy Bechtolsheim, co-founder of Sun Microsystems;¬†Joi Ito, Director of MIT Media Lab;¬†Brad Garlinghouse, CEO of YouSendIt (and formerly of TechCrunch parent company AOL), Maynard Webb, Chairman of the Board at LiveOps, formerly eBay COO; Elad Gil,¬†VP of Corporate Strategy at Twitter;¬†Jonathan Heiliger, former VP of Technical Operations at Facebook; Redbeacon co-founder¬†Aaron Lee; and founder of VitalSigns¬†Montgomery Kersten.
Matrix Partners also participated in the round. Of the new investors, Sky Dayton will be the first to join Diffbot’s board and will be taking an active role in the company, including plans to go hands-on with various Diffbot projects.
Last August, the company publicly debuted its first APIs, which allow developers to build apps that can automatically extract meaning from web pages. For example, the ¬†Front Page API is able to analyze site homepages, and understands the difference between article text, headlines, bylines, ads, etc. The Article API can then extract clean article text, images and videos.¬†Another example of Diffbot in action is the “follow API,” which can track the changes made to a website.
Today, Diffbot has categorized the web into about 20 different page types, including homepages and article pages, which are the first two types it can now identity.¬†Going forward, Diffbot plans train its bots to recognize all the other types of pages, including product pages, social networking profiles, recipe pages, review pages, and more.
Its APIs have been put to use by AOL (again: disclosure, TC parent) in its news magazine AOL Editions, as well as by companies like¬†Nuance,¬†SocMetrics, and others. Diffbot says it’s now processing 100 million API calls per month on behalf of its customers. Thousands of developers are using the APIs, the company notes, but paying customers
are only in the “tens.” Correction: we’re now told they have “a lot more!”
Diffbot founder and CEO Michael Tung (aka “Diffbot Mike”) says the new funding will ¬†be put towards new hires and expanding its resources. “More than that, we’re receiving a huge vote of confidence from veterans who have built massive companies and understand the fine points of building for scale, maintaining uptime and delivering the absolute highest standards of service.”
Tung is a patent attorney and Stanford PhD student who left the doctoral program to pursue Diffbot, thanks to seed funding from Stanford’s incubator, StartX. Diffbot was StartX’s first investment. With today’s funding, Diffbot total raise is $2 million and change.
Diffbot provides a set of APIs that enable developers to easily use web data in their own applications. Diffbot analyzes documents much like a human would, using the visual properties to determine how the parts of the page fit together. The algorithm uses statistical techniques to automatically and reliably determine the structural organization of a page, independent of layout and the language of the text. Diffbot’s technology is used by some of the world’s largest content companies. The company was the...