Twitter Changes Tweet Storage Strategy, Confirms Realtime Analytics Product

An interesting post just went up on the Twitter Engineering blog. Usually, that blog contains posts that are more interesting to developers working on Twitter’s platform. And this post is that as well, but it also states two much larger things. First, Twitter won’t be using the Cassandra database system to store tweets. Second, Cassandra will be used for Twitter’s realtime analytics product. The one they haven’t officially announced yet.

It’s been believed for some time that analytics would eventually be a part of Twitter’s monetization strategy, but they’ve never said much about it beyond vague statements about it being one potential idea. ReadWriteWeb’s Marshall Kirkpatrick dug up some evidence that it would be launching soon (which we also believe to be the case) two days ago. And in this post tonight, Twitter’s Ryan King writes the following, “Our analytics, operations and infrastructure teams are working on a system that uses cassandra for large-scale real time analytics for use both internally and externally.”

Yep, large-scale realtime analytics — externally.

But the bigger news may be the shift Twitter is making in the way it had stated it would be storing tweets. Previously, Twitter was intending to use this Cassandra system for tweet storage (dumping MySQL in favor of it), but that’s not going to be the case anymore — at least for now. “This is a change in strategy,” King notes. He goes on, “Instead we’re going to continue to maintain our existing Mysql-based storage. We believe that this isn’t the time to make large scale migration to a new technology.

I’m assuming the time isn’t right for the migration because Twitter has been dealing with uptime issues as they face levels of traffic that they’ve never seen before (thanks in part to the World Cup — which ends on Sunday). We have a query into Twitter about that.

Cassandra is an open source Apache project to create a “highly scalable second generation distributed database.” It was originally open-sourced by Facebook back in 2008. King notes that the system will continued to be a key part of many of Twitter’s newer large scale projects, such as their geolocation places database, data mining for data used in top tweets and trends, and the aforementioned analytics. “We’re investing in Cassandra every day. It’ll be with us for a long time and our usage of it will only grow,” he concludes.