Resorting to chaos may not sound like a sound engineering method, but chaos engineering is quickly becoming a standard way to test complex systems before real-world outages put those systems to a test. Netflix’s engineering team launched Chaos Monkey back in 2012 and that remains one of the most-used tools for this, but Microsoft Azure today also launched a similar tool for users on its cloud platform: Azure Chaos Studio.
Using Chaos Studio, Azure users can see how their apps respond against real-world disruptions by throwing at them random outages, extreme network latencies, expired secrets and even complete data center outages. It’s one thing to theorize what would happen if one of those things happened, and plan accordingly. It’s another to see it in action. Given the complexity of modern data center infrastructures, chances are that even a minor outage somewhere could cascade into a much larger issue and, before you know it, your platform is down for a few days.
It’s worth noting that AWS, with its Fault Injection Simulator, offers similar capabilities for its users and like with any popular engineering concept, there are startups like Gremlin that fully specialize in chaos engineering as a service.