Twitter’s having problems today! Oh, what’s that, you knew already? Well fine then. But we’ve just gotten our hands on some additional data related to today’s outage courtesy of application monitoring platform Compuware APM. The company pulled measurements from its backbone data centers, Last Mile network (actual end users) and mobile devices, and has basically documented the problem by examining response times across these sources.
We’re told that the current speculation is that the Twitter outage may have to do with the increased activity due to Olympics-related traffic.
The Olympics are expected to bring an unprecedented surge of activity by sports fans on social networking sites, the company tells us, and some sluggishness had already been reported. Compuware says it’s now diving into the data further to determine whether or not the outage had anything to do with the increased Olympics traffic, or if that was merely a coincidence.
For those who missed it (i.e., people with a life), Twitter experienced an extended outage today beginning around 11:23 AM ET, according to reports from performance monitoring company Apica. The service briefly recovered at 12:04 PM, then crashed again at 12:15 PM. At 12:58 PM, Twitter started delivering server-side error codes. One of those was the pretty hilarious one you see above. For whatever reason (groan), the message handing function there didn’t fill in what the reason was, leaving users scratching their heads.
Below, the charts. Click to see larger:
Twitter says that’s not the case, via a new blog post that went live around 5 PM ET today:
We are sorry. Many of you came to Twitter earlier today expecting, well, Twitter. Instead, between around 8:20am and 9:00am PT, users around the world got zilch from us. By about 10:25am PT, people who came to Twitter finally got what they expected: Twitter.
The cause of today’s outage came from within our data centers. Data centers are designed to be redundant: when one system fails (as everything does at one time or another), a parallel system takes over. What was noteworthy about today’s outage was the coincidental failure of two parallel systems at nearly the same time.
I wish I could say that today’s outage could be explained by the Olympics or even a cascading bug. Instead, it was due to this infrastructural double-whammy. We are investing aggressively in our systems to avoid this situation in the future.
On behalf of our infrastructure team, we apologize deeply for the interruption you had today. Now — back to making the service even better and more stable than ever.
- Mazen Rawashdeh, VP, Engineering (@mazenra)
Update, 7:30 PM ET: And, one more chart, just for fun (source: Keynote.com):