The cloud backlash has begun: Why big data is pulling compute back on premises

The great cloud migration has revolutionized IT, but after a decade of cloud transformations, the most sophisticated enterprises are now taking the next generational leap: developing true hybrid strategies to support increasingly business-critical data science initiatives and repatriating workloads from the cloud back to on-premises systems. Enterprises that haven’t begun this process are already behind.

The great cloud migration

Ten years ago, the cloud was mostly used by small startups that didn’t have the resources to build and operate a physical infrastructure and for businesses that wanted to move their collaboration services to a managed infrastructure. Public cloud services (and cheap capital in a low interest-rate economy) meant such customers could serve a growing number of users relatively inexpensively. This environment enabled cloud-native startups such as Uber and Airbnb to scale and thrive.

Over the next decade, companies flocked en masse to the cloud because it lowered costs and expedited innovation. This was truly a paradigm shift and company after company announced “cloud-first” strategies and moved infrastructures wholesale to cloud service providers.

Cloud-first strategies may be hitting the limits of their efficacy, and in many cases, ROIs are diminishing, triggering a major cloud backlash.

The growing backlash

However, cloud-first strategies may be hitting the limits of their efficacy, and in many cases, ROIs are diminishing, triggering a major cloud backlash. Ubiquitous cloud adoption has given rise to new challenges, namely out-of-control costs, deepening complexity and restrictive vendor lock-in. We call this cloud sprawl.

The sheer quantity of workloads in the cloud is causing cloud expenses to skyrocket. Enterprises are now running core compute workloads and massive storage volumes in the cloud — not to mention ML, AI and deep learning programs that require dozens or even hundreds of GPUs and terabytes or even petabytes of data.

The costs keep climbing with no end in sight. In fact, some companies are now spending up to twice as much on cloud services as they were before they migrated their workloads from on-prem systems. Nvidia estimates that moving large, specialized AI and ML workloads back on premises can yield a 30% savings.

Furthermore, new regulations are complicating cloud environments. U.S. and European data sovereignty laws require enterprises to manage and isolate data in multiple regions according to varying compliance regulations, with compute attached to each one. This makes a single-region, single-cloud design no longer feasible for sophisticated global enterprises, further adding to the cost and complexity of infrastructure.

Lastly, cloud service providers have continued to move up the stack from providing not just infrastructure as a service (IaaS), but also platform as a service (PaaS) and software as a service (SaaS) in one convenient, integrated cloud deployment. These PaaS and SaaS offerings are a double-edged sword; while they provide ease of use and expedite time-to-value, they also have higher prices/margins and lead to vendor lock-in.

Unlike S3, AWS’ storage layer with an API that has become standard across clouds and on-prem storage providers, higher-level services like GCP Vertex are unique offerings with no intracloud compatibility. The net effect of building these higher-in-the-stack services into your IT architecture is being locked into a specific cloud provider.

The great repatriation

Thankfully, many of today’s most sophisticated companies realize that this new cloud paradigm is untenable and are developing a hybrid multicloud approach consisting of more than one public cloud provider and on-prem systems.

Andreessen Horowitz studied 50 top publicly traded software companies’ disclosed cloud costs and found that, “for every dollar of gross profit saved, market caps rise on average 24-25X the net cost savings from cloud repatriation. This means an additional $4B of gross profit can be estimated to yield an additional $100B of market capitalization among these 50 companies alone. Extending this analysis to the broader universe of scale public companies that stand to benefit from related savings, we estimate that the total impact is potentially greater than $500B.”

Walmart, for example, recently disclosed a years-long project to diversify its infrastructure to include edge compute at their store locations to augment what was previously provided by their Azure and GCP cloud infrastructure. Walmart’s new multicloud structure enables it to “switch seamlessly” between Google’s and Microsoft’s web-based services and its proprietary servers. Walmart said that “the system has saved as much as 18% annually on overall cloud expenditures and mitigates the potential for outages.”

If the Fortune 50 organizations repatriating workloads back to on-prem isn’t convincing, take a look at cloud service providers. All of the major cloud providers have invested in new products to add on-prem resources to their cloud stacks. AWS has debuted Outposts with K8s, Google is developing Anthos, and Azure is touting ARC.

If Amazon, Google and Microsoft are pouring vast sums into product development for on-prem capabilities, clearly the writing is on the wall for future customer demand. Even cloud stalwarts like Snowflake are making pivots into the hybrid/on-prem space.

Multicloud is the new cloud, and multicloud now includes on-prem.

What should companies do next?

The good news is that now more than ever, companies have more flexibility in how they develop their infrastructures.

One of the great second-order benefits of the great cloud migration has been the development of a whole new category of technologies incubated by cloud service providers. Better DevOps automation tools and cloud-native application design with technologies like containerization and Kubernetes have proliferated from the cloud world and become accessible to any organization (e.g., Google had the wisdom to allow Craig McLuckie and Joe Beda to develop Kubernetes and open source it to the world).

These new tools and technologies have reduced the cost and operational overhead for companies to manage their own infrastructure. The cloud-native approach provides more flexibility for organizations to move workloads between different underlying IaaS stacks. While not all companies are ready for a wholesale repatriation, there are a few things that can be done to ensure future flexibility for where workloads are created.

  1. Don’t lock yourself in. Avoid cloud services that are higher up the stack and designed to create lock-in. IaaS services have given way to PaaS and SaaS offerings. Each of these are only available from a single cloud service provider and serve to create lock-in to that particular cloud. If you spend time migrating your data science and ML workloads to AWS Sagemaker, for example, then migrating that back to on-prem or another cloud won’t be possible. The PaaS stack is completely different with Google Vertex and there are no easy paths to migrate those workloads. Instead, opt for a vendor-agnostic stack that has portability to other clouds.
  2. Make portability a priority for your architectural review committee. When developing applications, establish an architecture review process to determine if software will have hybrid-compatible architecture underpinnings and ensure that applications embrace the architectural principles of the cloud to deliver cost savings, security and efficiencies, whether in the cloud or on-prem.
  3. Start investing in hybrid multicloud and decide for yourself. This can take the form of beachhead projects — e.g., picking one large AI training workload to run on-prem — or it can take the form of vendor exploration from providers who are betting on hybrid, such as VMware, RedHat and NetApp.

Choose wisely

While cloud computing was once a panacea for savings and innovation, returns are now diminishing as AI and ML workloads drive both data volumes and the need for accelerated compute upwards, impacting bottom lines. However, the remarkable success of the cloud has given us powerful new ways of building and managing IT.

Today, the flexible architectural underpinnings that enabled cloud-first giants to permeate enterprise stacks on and off premises, making them just as nimble for addressing architectural needs. The choice between data centers and clouds — even which cloud to use — is yours. Make it wisely.