Get Ready For The Firehose. Search Is About To Get Realtime, Real Fast.

Next Story

First Annual CrunchGear Halloween Costume Contest! Win an XBox!

After months of negotiations and holding both off at bay, Twitter now has agreements with both Bing and Google to give them access to its full feed of public Tweets. Both search engines have been yearning to drink directly from Twitter’s the realtime firehose of micro-messages and all that they carry. A rudimentary version of Bing’s Twitter search is already live, and it will soon add public Facebook updates to its search results as well.

While financial terms of the deals were not disclosed, full access to Twitter’s data stream is very valuable to both search engines. Depending on how much Twitter was able squeeze out of Google and Bing for these licensing deals, they are likely to provide its first major source of revenue. (Imagine, if they have to pay by the Tweet).

Tweets and other realtime data streams are valuable to Google and Bing because for many types of searches (news, events, sports, stocks, shopping, etc.), the most recent information is often the most relevant. And it’s hard to beat millions of people Tweetng out their thoughts—the “pulse of the planet,” if you will—for realtime information about every subject imaginable. Google and Bing need access to this stream of data if they want to keep their results fresh and relevant.

Up until now, they had to try to index Twitter’s site selectively by concentrating on high-profile Twitterers like celebrities. Twitter wouldn’t let their robots gobble up and index every Tweet because its servers wouldn’t be able to take that kind of pounding. But Twitter didn’t just want to hand over the feed of all of its public Tweets (the firehose) to the search engines without getting paid for it either.

Now that Google and Bing are getting the firehose, it could have a big impact on search results. For the search engines, the firehose is much more valuable than any single Tweet. They can index it and sift it, looking for patterns and spikes in keywords and shared links to get a better sense of what people across the Web are paying attention to at any given moment. This data can then be folded back into regular search results, even if the top result isn’t a Tweet.

For example, if a link to a post about healthcare reform on an obscure blog suddenly gains currency and is retweeted hundreds of times, that is a signal to perhaps rank that link higher in searches about “healthcare reform.” If people stop Tweeting about it, then maybe it goes down in the ranking. But Google and Bing can use the firehose as a rich source of signals to mine and then blend back into regular search results.

Of course, Tweets and other micro-messages will become part of results. And how the search engines display them and rank them will also determine how relevant their results are. Here is where it gets interesting because realtime search is a hard problem that has not yet been solved. Do you show the most recent, random Tweets first, or the ones with the most authority? And how do you rank a Tweet? We already have PageRank, but what we now need is StreamRank.

Many startups are tackling this problem, as is Twitter itself. And now Google and Bing can try their hand at finding the most important bits of data in the firehose. The results should be a more relevant, faster feedback loop between data appearing on the Internet and the search engines finding it.

Photo credit: Flickr/ZeroOne

blog comments powered by Disqus