The problem with Twitter is that it is too noisy. Filtering the signal from the noise is still too burdensome. The founders of search engine Kosmix think they have an answer with a new product called Tweetbeat, which they are unleashing in a preview version designed specifically to filter all the Tweets about the World Cup soccer tournament. Tweetbeat ingests the entire firehose of 65 million Tweets a day, and spits out only those about the World Cup which are it deems to be the most popular and important. It tries to capture everything from news to teams, players and fan shout-outs.
What’s more impressive, though, is that along the left-hand side are flag icons of 32 teams. When you click on a flag, you see Tweets only about that team. You can follow only Brazil, England, Nigeria, or whatever team makes you want to cover yourself with body paint. The name of the team or “World Cup” doesn’t even have to be in the Tweet. Tweetbeat recognizes individual player names such as Cole or Maradona, nicknames, teams, even stadiums, and it delivers all of these Twets in realtime. A slider at the top allows you to adjust the speed at which the stream flows down the page. Next week, Tweetbeat will be available as an iPhone app and desktop widget, and sites like MySpace plan to use the data in their own widgets.
Some early findings from the day before the first game begins (of English-only Tweets):
- Overall, the World Cup is the most popular category on Twitter in the past 24 hours after Justin Beiber.
- England is getting the most Tweets, with 40 percent more than any other team, followed by the U.S. Brazil is fifth (remember, in English), and Paraguay is dead-last in popularity on Twitter.
- The most popular players on Twitter are Joe Cole, Wayne Rooney, and Lionel Messi (again, if Kosmix analyzed Portuguese Tweets, I’m sure it would be a different story, but I am still kind of surprised Rolando didn’t rank higher).
Under the hood, Kosmix is applying its core semantic search technology to Twitter’s firehose of Tweets to categorize them instantly. Kosmix has created a taxonomy of the Web which spans more than 10 million topics and their relationships. It doesn’t rely on hashtags or keywords, but on relationships, influence, and trending clusters. Kosmix co-founder Anand Rajaraman explains:
To determine whether a tweet is part of a trending story, Tweetbeat creates real-time clusters of tweets, based on semantic similarity. The tweets in a cluster are about the same story. We then rank stories using a combination of many different real-time signals, including the number of tweets, the influence scores of the people who have tweeted the story, and the rate at which the cluster is growing (i.e., “story velocity” and “story acceleration”).
We use a variety of signals to compute a real-time influence score for every active twitter user. The score depends not only on static factors, such as number of followers, but also on dynamic factors, such as retweets and who retweeted. So, being retweeted by an influential user makes you more influential.
He calls this influence score “Krank.” Kosmix is applying these techniques right now to the World Cup teams and players as a showscase of what its technology can do, but later this summer it will release a full version of Tweetbeat across all topics.