The year in analytics

Well, 2016 is officially in the past. Between the election drama, the stock markets tossing and turning, celebrities moving on and Harambe, it was a doozy.

The SaaS correction

First things first. It’s hard to make sense of the overall state of the analytics ecosystem without taking into account the SaaS multiples meltdown in February. Most sophisticated observers have come to grips with the fact that a SaaS company in hyper-growth mode chews up every available dollar and then some. However, unless one has access to very fine-grained customer acquisition costs, churn and cohort activity data, it’s fiendishly difficult to tell the difference between a healthy hyper-growth SaaS company and one that is overpaying for low-quality growth.

Data of that fine granularity is something that your friendly neighborhood SaaS CFO is unlikely to include in their 10-Qs. This, combined with the sheer amount of cash being burned, makes investors very twitchy when it looks like growth is slowing. LinkedIn had such an unfortunate moment in 2016, and managed to ruin a lot of other companies’ nights in early February. Tableau, Qlik, Salesforce and many others got hit especially hard.

The state of data warehouses

We’ll start with how the data warehousing world shook out in 2016. Data warehouses are the place where data is stored; they power analytical use. Unless you use an all-in-one cloud BI provider (e.g. Domo, GoodData, etc.), a data warehouse is a central component of any analytics infrastructure. In 2016, the main trends we noted in 2015 — SQL-on-Hadoop displacing traditional analytics databases and the consolidation of data warehouses into cloud-hosting-provider offerings — continued.

SQL-on-Hadoop

From its earliest days, Hadoop has been largely used for business intelligence and analytics use cases, even when it wasn’t the best choice for it. In 2016, the various SQL-on-Hadoop options started to crowd out other analytics data warehouses.

The major projects (Hive, Impala, Presto and Spark) all had major versions that dramatically improved performance and stability. Hive, the original SQL-on-Hadoop player, shipped a key upgrade in March that will probably keep it lingering in people’s stacks for a few more years. Known primarily as a stable, legacy option, its new LLAP upgrade (don’t worry about what the acronym means) allows short-running queries to come back in “interactive” time. Previously, it was difficult to run any query, no matter how small, in less than five seconds.

Impala (2.6 in July) and Spark (2.0 in July) both had major performance improvements in large queries. Presto has had a steady drumbeat of improvements, culminating in the announcement of AWS’s Athena, which is a managed Presto on top of its S3 offering, on November 30th.

No one got fired for using their cloud provider’s managed data warehouse.

With all this momentum and maturity, it will be increasingly difficult for the likes of Teradata, Vertica or Aster to win any new accounts. Much like the emergence of Linux, Apache or WordPress closed out the operating system, web server and blogging software categories, respectively, a robust open source winner in a category can both close out the category to new proprietary entrants and doom competitors to a slow decline. While a single winner hasn’t emerged in this, the trend feels pretty firmly set, and the overall “SQL-on-Hadoop” feels more and more like the new default for large-scale analytics data warehouses.

Cloud provider managed data warehouses

With Azure announcing the general availability of their SQL Serverbased data warehouse in July, all three of the major cloud-hosting players (Amazon, Google and Microsoft) have a viable fully managed data warehouse offering. While opinions differ on the relative merits of each, it’s going to be increasingly common to make a decision as to which database technology to use based on what can be turned on with a couple of clicks on your existing hosting provider. 2016 may well be the year that a new truism takes hold: “No one got fired for using their cloud provider’s managed data warehouse.”

In other news, Microsoft’s increasingly softening position against the open source world has led to the previously unthinkable — SQL Server now runs on Linux! Announced in March to some fanfare, and generally available in November, this marks a tremendous shift away from Microsoft’s usual Windows Server-centricity — and hopefully is a sign of further openness.

Business intelligence applications

The biggest news in the realm of business intelligence applications, or the applications that people use to pull data from the aforementioned data warehouse, has largely centered around the big three cloud-hosting providers.

Microsoft products stereotypically need three releases before fully coming into their own. In 2016, Power BI went from being an interesting placeholder product release to being quite functional. It feels well into its 2.0 stage, and, given its history, will be picking up significant steam (and customers).

Once major companies have wired you into their processes, they’re not inclined to do much fiddling.

The elephant in the room just decided to wake up and start stretching its legs. QuickSight, Amazon’s BI offering, went GA in November of 2016. If you were a company depending on being the “default” interface to Redshift, you’re probably starting to lose sleep. Ecological niches have a habit of being filled, and time will tell if Amazon is able to make a winning end-user-facing application. So far all of the big AWS successes have focused on infrastructure and not end-user interfaces. However, the growing dominance of Redshift as the default data warehouse for anyone on AWS, coupled with AWS’s growth rate, make QuickSight well worth watching. There is a lot of spending adjacent to the data warehouse, and Amazon is making a concerted grab for it.

Not wanting to be left out, Google announced a reporting and dashboarding tool called Data Studio in May.

Qlik went private

If you’re in the U.S./Silicon Valley, Qlik is the biggest BI company you keep forgetting about. Well, Thoma Bravo remembered and took them private in June. In what is becoming the worst kept secret in Silicon Valley, it turns out SaaS companies are great buyout candidates. If you take a SaaS company with solid product and adoption, whose growth has stuttered, get rid of everything aside from a skeleton crew of engineering, customer support and accounting, cash flows can look very, very tempting.

While it’s risky to do this too early in a company’s product life cycle, once major companies have wired you into their processes, they’re not inclined to do much fiddling. All the margin of “software” with none of the cash trough of SaaS growth. What’s not to love?

Salesforce Wave

One of the interesting possibilities exposed when Salesforce launched Wave was that it would move the company toward making significant revenue from the BI category. That seems to not have worked. Instead, Salesforce seems to be betting on verticalizing instead, with the launches of Analytics for Community Cloud in March (e.g. embedding analytics results in other applications), dedicated marketing analytics and financial services in September and generally declaring it to be a platform for others to build analytics applications on top of in June.

This mostly puts to rest the possibility of Salesforce becoming a major player in the BI space directly, and seems more a way to add reporting capabilities to Force applications.  

Tableau

Tableau released a new version this year, with more connectors, data clustering capabilities, better mobile usage, cross database joins and a redesign. However, growth slowed and losses increased compared to the last few years. This, combined with the LinkedIn announcement, lead to Tableau’s stock price dropping by half in the February SaaS apocalypse, and it still hasn’t recovered. Bad times all around.

Periscope raised big

Joining Looker in the “is it or isn’t it reaching escape velocity?” club, Periscope raised a big honking $25 million Series B from Bessemer in November. It’s fitting in nicely of being a bring your own data warehouse (read: Redshift) plus a managed caching layer. That said, with the announcement of QuickSight, they and many other startups that rode the “frontend to Redshift” wave are in a tricky position. They are now directly competing with AWS’s offerings, rudimentary though they might be. Most of the pack of such applications have quickly added BigQuery support this year, and are hoping to diversify their customer base.

Data collection

Piping your interaction data into Redshift and using a BI application instead of Mixpanel and friends is starting to move from the bleeding edge to commonly accepted wisdom.

In the shadow of Amazon’s almost-but-not-quite-there offering (Mobile Analytics + Data Pipelines) to get interaction data into Redshift, there has been a bit of consolidation in the “piping data into Redshift” segment of the market. On many levels, the existence of these companies is really more of an indictment of how clunky and cumbersome AWS’s data movement options are.

The play unfolding is a consequence of the growth of cloud computing providers taking over the lion’s share of new hosting.

There has been a bit of a shakeout with a number of startups going quiet. A first set of winners seem to be shaking out. Segment continues to own the developer mindshare regarding the current Segment+Redshift+BI Application stack. Panoply raised a Series A in August from Intel Capital, and Alooma raised from Sequoia in 2014 and is still going strong. RJMetrics, an early all-inclusive cloud analytics provider, sold off its Cloud BI product to Magento in August and formed a new company, Stitch, around their data pipelines product.

The big picture

The play unfolding is a consequence of the growth of cloud computing providers taking over the lion’s share of new hosting. When all your infrastructure is on AWS, Azure or Google Compute, it takes a very strong reason to not use your provider’s fully managed data warehouse solution.

The writing is also on the wall that the same cloud providers will be providing the front end to this data. Add some analyst and data infrastructure spending and you get a fully unified data warehouse in the sky for a fraction of what it cost Google, Facebook or Netflix 10 years ago.

Most vertical analytics providers are going to need to start having strong answers to “why can’t I just run everything you provide as a SQL query on top of my company’s data on Redshift?” The gold rush in two-person SaaS vertical analytics companies is going to start looking more and more grim.