Databricks, the commercial company created from the open source Apache Spark project, announced the release of a free Community Edition today aimed at teaching people how to use Spark — and as an adjunct to the free online courses (MOOCs) it created last year.
The free version is a limited edition without all of the advanced features you would find in the enterprise-pay version. The company is providing a single Spark “micro-instance” for anyone interested in learning how to work with the big data processing tool.
The MOOCs were amazingly popular with 50,000 people starting the course and a full 20,000 going all the way through to completion including all the labs and tests, Ali Ghodsi, CEO and co-founder at Databricks explained.
When the company saw this level of interest in learning about Spark, it decided to develop the Community Edition as a complement to the courses. By combining Community Edition with the MOOCs, Databricks is providing a way for people to learn about data science and Spark regardless of income or location. All they need is access to a computer with internet access and the motivation to complete the courses.
Students can access basic Databricks functionality including creating a micro cluster, playing with the cluster management tool and creating a notebook and dashboard. If they wish to go further with larger clusters and more advanced functionality, they can simply swipe a credit card and move up to one of the business tiers.
Ben Horowitz, co-founder and general partner at Andreessen Horowitz, one of Databricks’ investors says the Community Edition substantially lowers the barrier for understanding big data and analytics.
“Before the release of Community Edition you had to either build your own cluster or pay thousands of dollars per month to learn how to do data science and advanced analytics such as machine learning,” Horowitz said in a statement.
Instead, the company is using Amazon Web Services to provide the computing resources for students to build and store clusters. It is able to do this and control costs by using a highly shared environment, which the company manages very carefully.
“We are sharing machines to drive down cost considerably. They are shut down if you are not using them, so we can reuse [the resources]. We used this [approach] for one of the MOOCs under the hood and it worked remarkably well,” Ghodsi said.
It’s not all about altruism though. The company believes that by training more people to use Spark and offering these tools for free, they can turn many of these students into paying customers — and they’ve found the approach has worked with the MOOCs alone. Combining it with the free Community Edition should increase that.
“It’s an excellent lead source. The truth is it makes a lot of business sense,” he said.
Databricks plans to add three additional classes this year, and hopes to train at least 100,000 people through their MOOCs over the next year. It could be a reachable goal, especially with the free Community Edition helping to drive usage.