Twitter developers have long been pining for access to historical Tweets. Right now, the best they can get is 7 days worth based on keyword search. DataSift, one of Twitter’s data partners which currently provides developers and third parties with access to the full Twitter firehose in realtime, will soon make historical Tweets accessible as well. Developers can sign up for the Alpha of DataSift’s Historical Data starting today (the actual service will begin to roll out in the first quarter of next year).
DataSift’s Historical service will give developers, social media monitoring companies, marketers, and brands access to 60 days of tweets for the Alpha, which can be analyzed and filtered beyond simple keyword search. When the service is launched more broadly later next year, it will go back as far as two years. DataSift allows for all sorts of data analysis because it pours all the tweets into a structured database. So you can give it queries like: “Give me all the tweets that mention TechCrunch from people who do not follow @techcrunch” or “All females in the UK who mention fashion.”
The company is already collecting the 1 terabyte per day of data internally—that is how much is produced by the firehose of 250 million tweets a day— with 400 terabytes total so far. “This is a real ‘big data’ engine—and that we are making it simple—we are taking advantage of map reduce—but this is our own bespoke processing engine,” says DataSift founder Nick Halstead, referring to the Hadoop technology the service is partially based on.