Cloudera And Facebook Shed More Light On Hadoop Integration

For a startup that was founded less than a year ago, Cloudera has seen some pretty amazing growth. Backed by an impressive list of investors and advisors and run by a team of experienced technology veterans, Cloudera commercially distributes and services Hadoop. It’s similar in theory to Red Hat’s distribution of Linux.

Hadoop is a Java software framework born out of an open-source implementation of Google’s published computing infrastructure which is fostered within the Apache Software Foundation. Hadoop supports distributed applications running on large clusters of commodity computers processing enormous amounts of data. Cloudera helps distribute Hadoop, and provides services around the technology. Via Cloudera, Hadoop is currently used by most of the giants in the space including, Google, Yahoo, Facebook, Amazon, AOL, Baidu and more. To date, Cloudera has raised $11 million in funding from Accel Partners and Greylock Partners.

Cloudera is organizing and hosting a conference, Hadoop World: NYC, in a few weeks to support the growing Apache Hadoop community. Facebook, Yahoo, Amazon Web Services and IBM will all be making presentations about how they use the technology to support large volumes of data.

Facebook is one of the more interesting use cases of Hadoop use, says Cloudera co-founder Christophe Bisciglia. Facebook software engineer Ashish Thusoo said that prior to Hadoop, the social network uses conventional RDBMS based data warehousing technologies and switched to the open-source Hadoop because of its scalability, cost and flexibility.

Facebook implements both Hadoop and Hive, which is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets stored in Hadoop files. For example, the software makes it easy to create business data reports with data, aggregation and analysis that is used to drive Facebook products, model generation and optimization problems for ads.

Cloudera is offering an exclusive discount code to the Hadoop event in New York City, with will knock off 25 percent of the list price of $399 per ticket. It’s valid through 9/29.