While much of the talk leading up to today was about Twitter’s move into the photo game, the bigger news is actually what they’ve done to their search product. They’ve completely rebuilt it. And while it may not be immediately apparent, the product should be much, much better than before.
Twitter details the project in a long post on their Engineering blog today. Notably, they go into the backstory of Twitter Search, which evolved from the Summize purchase in 2008. While that product worked well for a while, the technologies behind it would not allow it to scale to the level that Twitter eventually needed. So things had to be re-written — on the fly.
Twitter detailed some of this last October. But it wasn’t until this past April that they were able to replace the old Ruby on Rails front-end with the newly-built Blender. At the time, Twitter said this made search 3x faster and gave them 10x throughput. This is important since they’re now seeing 2,200 tweets-per-second on average and serving up 18,000 queries per second — 1.6 billion queries per day. That’s up from 1 billion last Ocotober.
But that’s still mainly back-end talk. The key to today’s search announcements are what is now being surfaced on the front-end. ”Blender completed the infrastructure necessary to make the most significant user-facing change to Twitter search since the acquisition of Summize,” Twitter writes.
Notably, Twitter now has a “Most relevant” tab on the search results page. And while at first glance it may seem that this is simply searching your contacts’ tweets (something that is long overdue) and displaying them in reverse chronological order, there’s actually a lot more going on. At its most basic, here’s how Twitter says to think about it: “Often, users are interested in only the most memorable Tweets or those that other users engage with. In our new search experience, we show search results that are most relevant to a particular user. So search results are personalized, and we filter out the Tweets that do not resonate with other users.”
Twitter cites three key types of signals they’re looking for:
- Static signals, added at indexing time
- Resonance signals, dynamically updated over time
- Information about the searcher, provided at search time
Based on these, a “personal relevance score” is computed for each tweet. “The highest-ranking, most-recent Tweets are returned to the Blender, which merges and re-ranks the results before returning them to the user,” Twitter notes.
Another big thing going on? Duplicates are removed. This has been a huge issue with Twitter search in the past.
Finally, Twitter has begun surfacing images and videos for searches. Right now, these are shown in the right-side pane when a search is done on twitter.com. Because they’re different from text-based tweets, these queries have to be handled differently.
So all of this sounds great. But it’s just step one. Twitter says that in the coming months, quality will improve as will scale. And they’ll be bringing these relevant searches to their mobile products.
As for full tweet history search? I still wouldn’t hold my breath. Real-time is simply more important to Twitter right now, so they’re focusing on that, instead of the past. Hopefully one day…