Diffbot Raises $2 Million Angel Round For Web Content Extraction Technology

Diffbot, the super-geeky/awesome visual learning robot technology which aims to “see” the web the way that people do, is today announcing a new infusion of capital. The company has closed $2 million in funding from a number of technology veterans, including EarthLink founder Sky DaytonAndy Bechtolsheim, co-founder of Sun Microsystems; Joi Ito, Director of MIT Media Lab; Brad Garlinghouse, CEO of YouSendIt (and formerly of TechCrunch parent company AOL), Maynard Webb, Chairman of the Board at LiveOps, formerly eBay COO; Elad Gil, VP of Corporate Strategy at Twitter; Jonathan Heiliger, former VP of Technical Operations at Facebook; Redbeacon co-founder Aaron Lee; and founder of VitalSigns Montgomery Kersten.

Matrix Partners also participated in the round. Of the new investors, Sky Dayton will be the first to join Diffbot’s board and will be taking an active role in the company, including plans to go hands-on with various Diffbot projects.

Last August, the company publicly debuted its first APIs, which allow developers to build apps that can automatically extract meaning from web pages. For example, the  Front Page API is able to analyze site homepages, and understands the difference between article text, headlines, bylines, ads, etc. The Article API can then extract clean article text, images and videos. Another example of Diffbot in action is the “follow API,” which can track the changes made to a website.

Today, Diffbot has categorized the web into about 20 different page types, including homepages and article pages, which are the first two types it can now identity. Going forward, Diffbot plans train its bots to recognize all the other types of pages, including product pages, social networking profiles, recipe pages, review pages, and more.

Its APIs have been put to use by AOL (again: disclosure, TC parent) in its news magazine AOL Editions, as well as by companies like NuanceSocMetrics, and others. Diffbot says it’s now processing 100 million API calls per month on behalf of its customers. Thousands of developers are using the APIs, the company notes, but paying customers are only in the “tens.” Correction: we’re now told they have “a lot more!”

Diffbot founder and CEO Michael Tung (aka “Diffbot Mike”) says the new funding will  be put towards new hires and expanding its resources. “More than that, we’re receiving a huge vote of confidence from veterans who have built massive companies and understand the fine points of building for scale, maintaining uptime and delivering the absolute highest standards of service.”

Tung is a patent attorney and Stanford PhD student who left the doctoral program to pursue Diffbot, thanks to seed funding from Stanford’s incubator, StartX. Diffbot was StartX’s first investment. With today’s funding, Diffbot total raise is $2 million and change.