Microsoft bets on Apache Spark to power its big data and analytics services

Microsoft today announced that it is making a serious commitment to the open source Apache Spark cluster computing framework.

After dipping its toes into the Spark ecosystem last year, the company today launched a number of Spark-based services out of preview and announced that the on-premises version of R Server for Hadoop (which uses the increasingly popular open source R language for big data analytics and modeling) is now powered by Spark.

spark-logo-trademarkIn addition, Microsoft announced that R Server for HDInsight (essentially the cloud-based version of R Server) is coming out of preview later this summer and Spark for Azure HDInsight is now generally available with support for managed Spark services from Hortonworks. Power BI, Microsoft’s suite of business intelligence tools, will now also support Spark Streaming to allow users to push real-time data from Spark right into Power BI.

All of these announcements mark what Microsoft calls “an extensive commitment for Spark to power Microsoft’s big data and analytics offerings.” These offerings include Power BI and R Server, but also the Cortana Intelligence Suite, which combines some of Microsoft’s big data and analytics services under a single umbrella that also features a number of machine learning tools.

Microsoft, as well as Google, Baidu, Amazon, Databricks and others, will feature prominently at the Spark Summit in San Francisco this week. Microsoft promises to share more information about its commitment to Spark at the event, too.