In a post entitled “Quite the way to celebrate our 200 millionth check-in“, Foursquare is once again apologizing this morning for downtime yesterday. It was only yesterday that a long, fairly technical explanation was written on the same blog going over exactly what led to 11 hours of downtime the day before. At the time, Foursquare wrapped it up, noting: “So we now have more shards and no danger of overloading in the short-to-medium term.”
Whoops. That seems to be exactly what happened again yesterday.
In a nutshell, the same thing happened: an overloaded database, the solution to which again was manually redistributing check-in data to make sure no databases were overburdened and then rebooting the site, which we were finally able to do after nearly six hours of downtime. Unacceptably and frustratingly long, again.
The service isn’t being quite so bold this time in predicting smooth sailing going forward. While they they have some new safegaurds in place, this time they’re just saying “We’re hoping that these changes will help to stabilize things going forward.”
While they’ve had a very rough couple of days, Foursquare is still not close to the downtime Twitter experienced while it was growing. Hopefully the relatively small team (32 people) can work this out. At least they’re being open about the problems and communicating well — follow along here and here.