The Real Time Search Dilemma: Consciousness Versus Memory

One of the hottest areas of search right now is real time search, which attempts to find results based on what is happening right now. Twitter’s search engine fast becoming one of the key ways to navigate the service and discover what people are thinking about any subject at any given moment. Facebook is testing out ways to let you search your personal stream. Google is waking up to the challenge as well (Larry Page is particularly concerned with keeping up).

Every week, it seems, a new startup launches tackling real time search from a different angle. (Collecta, One Riot, Scoopler, Topsy, Almost.at, Tweetmeme, CrowdEye, Omgili, to name a few). They are trying to apply real time search to all the different streams of information flowing over the Internet right now: Twitter, Facebook feeds, Digg submissions, blog comments, RSS feeds, Flickr photos, YouTube uploads, shared links on bit.ly and elsewhere. The list keeps getting longer every day.

There is something about human nature which makes us want to prioritize information by how recent it is, and that is the fundamental appeal of real time search. The difference between real time search and regular search didn’t really crystallize for me until I had a conversation with Edo Segal, who sold his real time search company Relegence to AOL a few years ago and holds three patents on the subject. “Real time taps into consciousness,” says Segal, “search taps into memory. That is why it so potent. You experience the world in real time.”

This raises an interesting dilemma. If real time data streams are akin to the living consciousness of the Web, how do you search them? How do you search consciousness? It is not the same as searching memory, which is what Google does when it looks at its indexed archive of the Web and how those pieces of information build up authority over time. The real time search dilemma centers precisely around how to rank results, and how to resolve the tension between recency and relevancy.

The default, or at least the starting point, for most real time search engines is simply to put the most recent results up top and then keep pushing then down in a free-flowing river of information as new results which match the query come in. That is what Twitter search does, for instance. It is a chronological stream of the most recent Tweets containing a particular set of keywords. Real time search startup Collecta also takes this approach of simply presenting the stream as it comes in, and letting you filter by source. Ranking results any other way would automatically reorder them and automatically make them less real-time.

Yet not being able to filter that stream generates too much noise. Other approaches attempt to add in other factors. OneRiot, for instance, is developing what it calls PulseRank, which takes into account the freshness of the information, the link authority of the Webpage where it is coming from, the authority of the person who is sharing the link, and the velocity with which the information is being passed around the Web. This seems like a reasonable approach, but it may not catch something important as fast as simply watching the unadulterated stream.

There are other approaches as well. You can look at what people on the Web are actually doing in real time or look for variations in the stream of mentions for any given keyword to notice spikes of activity. When everyone is talking about Michael Jackson or Iran above and beyond the normal level of chatter for those topics, that is when you want to know that you need to pay attention. So maybe real time search is more like an alert system.

Can you search consciousness, or can you only watch it pass by? We’ll be debating this at one of the panels on real time search at our Real Time Stream CrunchUp in July. But it is clear that in order to make sense of the stream, it needs to be ranked by order of importance as well as by time.

(Photo credit: Flickr/Andrew Sea)

CrunchBase Information

Information provided by CrunchBase