PagerDuty has raised $10.7 million from Andreessen Horowitz to build out an IT alert service that is modeled on what Amazon Web Services developed to manage its massive infrastructure. The funding will go toward building additional services to give IT managers a way to quantify how staff is responding to alerts and develop new mobile apps for notifications. A Y Combinator startup, PagerDuty raised a $1.9 million seed round in 2010 from angel investors.
PagerDuty is a SaaS provider that plugs in IT monitoring tools to provide phone and SMS alerts to IT workers. Co-founder and CEO Alex Solomon said in an interview that the service is based a lot upon the systems that AWS had to build to keep its infrastructure running when he worked there. He said AWS had an engineering team of about 10,000 people.
The engineering group is divided into teams of five to 10 people. Engineers carry pagers on their belts and get notified when a problem arises. If they don’t respond, managers get a call. It’s a homegrown system, much like ones Facebook and Google had to create for themselves in order to manage their own infrastructures.
PagerDuty’s system pulls in data from a laundry list of IT-monitoring tools that include Nagios for server monitoring, Pingdom for website monitoring and New Relic for instrumenting apps. When an error pops up, the scheduling service routes the message to the person who is on call. No response and the problem gets escalated.
The company is profitable, Cox said. It has thousands of customers, including brands like Adobe, EA, Etsy, Square and Intuit, among others in the Fortune 500.
Most companies with their own data centers cobble together systems that deliver email or SMS when a problem arises, but it’s not unusual for messages to drop and go without response for several hours. Usually there is no schedule integrated, so everyone gets the same alert. Some companies have Network Operating Centers (NOC), which are like command centers managed 24 hours by people who watch the instruments to make sure no issues arise. NOCs can work, but they do not scale like software can to monitor networked environments.
Andreessen Horowitz Partner John O’Farrell said that there are studies that show it costs $300,000 for a one-hour outage. In the United States, businesses lose $26 billion a year on outages.
But data-center monitoring has not changed much in the past 10 to 12 years. IT management companies like CA and HP have a deep presence in the data center world, but they are not designed as much for next-generation infrastructures.
Infrastructures are increasingly sophisticated and complex but most of us take these data centers for granted. Our reliance will only increase as we process more and more data and need ever more fine-tuned services to make sure the infrastructure is running properly.