As Cloud Arrives On Main Street, We Need A New Set Of Metrics For Cloud SLAs

Sharon Wagner Contributor

Editor’s note: Sharon Wagner is the founder and CEO of Cloudyn, a leading provider of cloud analytics and optimization tools for multi-cloud deployments. He is a leading expert and key patent holder in SLA technologies and previously worked at CA Technologies within its cloud-connected enterprise business unit.

A lot can happen in a year, and in the world of cloud computing, 2014 was a breakout one. Cloud adoption finally experienced a tornado of demand that swept up large enterprises en masse. Yet as businesses move services to the cloud and increasingly depend on third-party vendors, important questions should be answered around who is responsible for managing these services and how service quality should be measured.

The main objective of a Service Level Agreement (SLA) is to clearly define relationships and set expectations for adequate service levels between the buyer and the seller. In the case of the cloud, this would be the cloud provider and the cloud consumer. A traditional SLA is a rigid and custom contract with complicated legalese focused around operational metrics provided by IT and using IT internal resources.

A cloud SLA is a different animal mainly because cloud customers leverage the cloud as an extension of their internal IT: They don’t own the infrastructure, they don’t maintain it, and they can’t control its provisioning or maintenance procedures. The cloud’s shared responsibility model splits the responsibility between the cloud provider and the cloud customer: The customer is responsible for the application SLA and the provider is responsible for the infrastructure SLA.

Four 9s or five 9s — does it really matter?

Cloud providers and customers typically zero in on availability that is measured as the time a system is accessible and is expressed as a ratio between actual uptime and expected uptime presented in a percentage. Note that expected uptime is subject for discussion: expected uptime may or may not include scheduled maintenance hours, reduction of “force major” hours, and more.

Assuming no “force major” situations or scheduled maintenance hours, in a 365 x 24 year, five nines (99.999 percent) represent five minutes of downtime and four 9s (99.99 percent) represent about 50 minutes. Does it really matter?

Given availability as an indicator for quality of service, rest assured that cloud vendors continue to invest in additional infrastructure to support a growing number of customers and enterprises. Therefore, perhaps a better way to measure cloud availability would be to apply qualitative metrics, such as mean time to repair (MTTR) and the mean time between failures (MTBF).

Redefined metrics for the cloud

Undeniably, availability is an important metric. However, good service in the cloud goes far beyond availability. For enterprise-level companies seeking to monitor or enforce the transit of their workload from on-premise to the public cloud, we believe that a new standard is needed that measures a group of categories.

While most cloud vendors focus on availability and provide credits based on availability, a cloud SLA should reflect multiple service level objectives (SLO) and various aspects of the provided service. SLOs should be categorized and measured as follows:

Availability: Metrics such as uptime, MTTR and MTBF
Performance: Response time, number of simultaneous requests and service throughput
Support: Response time, resolution time, resolution rates and service escalation rates
Authentication and identity assurance and vulnerability remediation — MTTR
Data management: Data mirroring latency, backup retention time and transfer rates

Since no standard SLA is used across cloud providers, those providers redefining and building these new set of SLO standards will be the ones to help enterprises accelerate the onboarding of their critical applications to the cloud.

End-to-end: The right SLA for the shared responsibility model

Many businesses blindly take for granted that they will be protected by an SLA. While the SLA is used to settle any dispute between a provider and a customer, in the case of infrastructure failure, the SLA cannot be considered as “protection.” Customers who provide their users with service based on a public cloud infrastructure should define SLAs end-to-end.

Let’s take the example of a cloud customer who provides a CRM platform to their users and uses services of one of the public cloud providers. While the cloud provider is responsible for the infrastructure, the platform availability is managed by the client. The end-to-end CRM SLA will include metrics such as service availability. This will be a combined business metric that includes the cloud infrastructure availability SLO and the CRM application availability SLO. After all, the end user is not interested in the shared responsibility model; they just want to ensure service availability.

Standardization: Is it coming?

Until now, SLAs have been unregulated in what has been a sellers’ market, with the cloud service providers calling all the shots. However, the pendulum has started shifting slowly in favor of the buyers, ultimately making it better for businesses.

The evolution of these new standards will mirror what happened in the IT market. In the IT world, the Information Technology Infrastructure Library became the best-known standard after organizations started independently creating their own IT management practices. As enterprises demand metrics from cloud providers, they will eventually become mandatory requirements with credits and penalties for violations. We expect such a shift to occur over the next 12 months.

Since the Edward Snowden disclosures of 2013, there has been increasing pressure on governments and technology companies to provide more transparency. Europe has been one of the strongest supporters for change. In June of last year, Europe took a leap forward and began establishing cloud SLA standardization. NIST (National Institute of Standard and Technology, US Department of Commerce) provides further SLA guidelines for vendors.

Cloud SLAs will be standardized, one way or the other. It’s really just a function of time. After all, the cloud is ubiquitous and doesn’t have borders.