Digg has released some materials around their new Recommendation Engine, which we wrote about last night, and say that it will be released this week. Two overview videos are below, including an interview with Digg Lead Scientist Anton Kast. We’ve also included the text of a white paper on the Recommendation Engine.
The Digg Recommendation Engine
People love Digg because it’s a place to discover and share great content from around
the Web. The Digg homepage always has the most popular stories, but many Digg
users find their content in the Upcoming section, which gets over 15,000 new stories a
day. To help users filter this enormous amount of content, we have created a new
feature: The Digg Recommendation Engine.
When you Digg a story, you tell the Recommendation Engine two things: that you
recommend the story to other users and, less obviously, that the users who Dugg the
story before you are good at finding content. The Recommendation Engine keeps track
of users who Dugg particular stories before you did, and it recommends you the stories
they Dugg. The more content you Digg, the smarter the Recommendation Engine
Finding Diggers Like You
The Digg Recommendation Engine uses your Digg history over the last thirty days to
make Recommendations. (You can see the number of items you have Dugg over the
last month on the right-hand side of the Recommended view.) Every time you Digg a
story, the Engine matches you with other Diggers who Dugg the same story, and keeps
track of all your Diggs in common with them.
When it’s time to calculate your Recommendations, the Engine draws from this pool of
matched Diggers. For each matched Digger, it computes a correlation coefficient
between you and them. It then picks a cutoff for this correlation coefficient, and the
Diggers who make the cut are called “Diggers Like You.”
It’s easy to understand how the correlations are calculated. For each user with whom
you Dugg something in common, the Engine determines how many stories the two of
you Dugg in common, and divides that number by the total number of stories you or they
Dugg. The ratio is a correlation coefficient, a number between zero and one (zero if you
and the other user never agreed; one if you always did). Such a ratio is sometimes
called a “Jaccard coefficient.”
This scheme automatically accounts for the overall level of Digging activity. If another
user Diggs a lot, they have to agree with you on many stories to become a Digger Like
You. If another user Diggs rarely, then a small amount of agreement can suffice.
From Diggers Like You to Recommendations
Once the Engine has determined your Diggers Like You, your Recommendations consist
of stories that your Diggers Like You have already Dugg, minus the stories you already
Dugg or Buried. There are some extra steps, like the diversity rules and the
promotability constraint described below, but this is the basic idea.
Recommendations are always displayed together with your Diggers Like You and their
compatibility percentages. These percentages are just correlation coefficients. You may
notice that you are more compatible with a user that has fewer Recommendations than a
user with less compatibility but with more Recommendations. This is because although
you have Dugg more items in common with the more compatible user, that user has not
Dugg as much.
The Recommendations you get from any particular user will come from topics (such as
Technology or World News) where you have a shared Digging history. We figure that
two users may have similar interests in a subject like ‘playable web games’, but one
person might be into politics while the other follows celebrity gossip. So we actually
compute correlations, Diggers Like You, and compute Recommendations in several
collections of topics independently.
Since the Recommendation Engine works only with Upcoming stories, all the stories you
get from the Recommendation Engine are “promotable”, meaning that they are recent
enough to be eligible for the Digg homepage but haven’t appeared there yet. This
means that whenever you Digg one of your Recommendations, you are helping select
stories for the front page of Digg!
Just like stories on the homepage, we want your Recommendations to be diverse: a
balanced number of stories, not all on the same topic, and not all Dugg by the same
To make sure that your Recommendations are diverse, the Engine imposes limits that
keep things from getting too focused. It makes sure that no one Digger Like You
determines too many of your stories. It attempts to make your Recommendations reflect
the spectrum of topics that you’ve Dugg in the past, and it adjusts the compatibility cutoff
for Diggers Like You so you don’t get too many or too few stories.
The Engine also limits the influence of any single one of your Diggs. For instance, if you
are Digg number 1,000 on a popular story, you will have 999 similar users from that one
Digg alone, and those users are not necessarily more compatible with you than the two
or three who may have Dugg a less popular story you also liked. The Engine limits the
total pool of users you can get from a single Digg to balance things out.
We hope you enjoy using the Recommendation Engine and look forward to helping you
uncover even more great stories on Digg!
Anton Kast – Lead Scientist Digg