Databricks Snags $33M In Series B And Debuts Cloud Platform For Processing Big Data

Databricks, the commercial entity created by the developers of the open source Apache Spark project, announced $33M in Series B funding today and the launch of a new cloud product, their first one as a company.

There is little doubt that big data is a big deal these days and companies are popping up to help customers process the data. Databricks hopes to simplify the entire matter by moving it to the cloud to reduce management headaches, while speeding it up by using Apache Spark to drive the platform.

First, let’s look at the funding, which is led by New Enterprise Associates (NEA)  with a contribution from previous investor Andreessen Horowitz.  It brings the total funding to date to $47M.

The latest round gives the company a huge financial boost and CEO Ian Stoica says, they hope to increase the number of employees and expand rapidly.

In addition to the funding, the company also announced a new cloud platform called Databricks Cloud that Stoica, says has been designed to simplify big data processing by bringing the process under one cloud umbrella.

The cloud solution consists of three pieces: The Databricks Platform, Spark and the Databricks workspace. The idea behind the product, Stoica says, is to provide a single place to process data without having to worry about managing a Hadoop cluster to process your data. It’s all done in the cloud instead in a managed environment.

After you add your data to a project, you can begin working with it immediately. The product has several core concepts starting with Notebooks, which provide a way to interact with the data and build graphs. As you discover ways of displaying your data, you can begin to build dashboards to monitor certain types of data. Finally, the platform includes a job launcher, which enables users to schedule Apache Spark jobs to run at certain times.

The product has been designed to allow customers to access and plug in third-party Spark applications, so if they have additional requirements not available in the base Databricks platform, they can use existing third-party applications to take advantage of whatever those tools have to offer.

The company believes that by providing a set of tools built in the cloud, they will remove much of the pain and complexity involved with a typical big data processing project where so much time is spent simply getting the right tool set in place before any work even gets done.

Initially, Stoica told TechCrunch, the product will run on AWS, but they are looking to expand to Google Compute Engine and Microsoft Azure –and their large infusion of cash should help facilitate that.

The company was born out of the Apache Spark project, which was originally developed by Stoica and colleagues from research at The University of California, Berkeley in 2009. He and his fellow researchers were looking for something that was faster than Hadoop and they developed Spark, which they open sourced in 2010.

Stoica says that it’s faster for a number of reasons including that it requires less code to process the job and it runs entirely in-memory, rather than using disk reads, which can slow down the processing.

While the company continues to support the open source project, last year as Spark was gaining traction, they decided to create a commercial entity on top of that and got $13.7M from Andreessen Horowitz to build the product and the company.

Today marks their debut product. Stoica indicated the platform is available to a limited set of customers today, but they will be expanding that gradually in the coming months.