Google Cloud launches a managed Spark service

At its Cloud Next event, Google today announced the launch of Spark on Google Cloud as a fully managed service. With this, the popular open source data processing engine will become a premium offering on Google Cloud.

“With this innovation, Spark finally arrives in the cloud-native world,” said Gerrit Kazmaier, Google’s VP & GM for Database, Data Analytics & Looker. “It allows data engineers and data scientists to work with Spark without worrying about cluster end configurations. We also integrated it into all of our data services. So you can launch it directly from BigQuery, from Vertex AI, from Dataplex. It makes using Spark so easy that it allows our customers to use the frameworks and the toolkits that they’re familiar with — they love the data science experience, and they can now consume it in a cloud-native way.”

Google argues that this is the “world’s first autoscaling and serverless Spark service for the Google Cloud data platform.” But it’s worth noting that, given its popularity, there are plenty of other companies that will run and manage Spark for their customers. Spark is also at the center of Databricks’ platform, which is maybe no surprise, given that the well-funded startup was founded by the creators of Spark.

You may also wonder: Doesn’t Google Cloud already offer a managed Spark service as part of Dataproc (that is, of course, if you’re one of the five people who is capable of remembering every service that Google, Amazon and Microsoft now offer in their clouds…)?

These are different services, targeting different customers, though, Kazmaier told me. If you already have Spark, Hadoop or maybe MapReduce, Presto and other systems up and running, then the idea here is that Dataproc will give you all of these, but as a managed service. But for Kazmaier, the focus of what he’s building out around Google Cloud’s data services is all about simplicity and especially about making life easier for companies that are just getting started in their data journey.

“You’re building a data team — and you hire one data engineer and one data scientist? Do what you really want to get started by saying: ‘I’ll now build a storage system. I’ll build up a metadata system from the ground up.’ Of course not, but this is literally what you are forced to do today,” he said. “Now with Spark serverless, you just say ‘go.'”