Netflix recommendation system improvements

netflix prize
There hasn’t been too much buzz lately about that Netflix programming contest. You remember the one: Netflix ponied up $10M US dollars to any team that could improve their movie recommendation system by 10%. When I was a Netflix user I was never really dissatisfied by the recommendation system, so it’s hard for me to imagine how they’d gauge that it was 10% better than it was before. Nonetheless, lots of very smart people took the challenge, and an awful lot of progress has been made.

Over at IEEE Spectrum there’s a very nice post-mortem from one of the teams that won a Progress prize. The piece covers some of the methods used to improve the recommendation system, and closes with this:

Now that the confetti has settled, we have a chance to look back on our work and to ask what this experience tells us. First, Netflix has incorporated our discoveries into an improved version of its algorithm, which is now being tested. Second, researchers are benefiting from the data set that the competition made available, and not just because it is orders of magnitude larger than previous data sets. It is also qualitatively better than other data sets, because Netflix gathered the information from paying customers, in a realistic setting. Third, the competition itself recruited many smart people in this line of research.

Reading this got me thinking about how enormously complex it must be to tease out movie recommendations based solely on an arbitrary number that represents a viewer’s opinion of a movie. That’d be like going to and expecting to find your soulmate by ranking your previous dates with 1 through 5 stars. Crazy!

Obviously its in Netflix’s best interest to keep the movie rating system as simple and as convenient as possible. If you had to rank each actor in a movie, or rank plot elements or directors or writers, the system could better recommend movies that appeal to you, but you’d be stuck at the Netflix website for days providing input. Thankfully there are super smart people out there figuring out how to identify patterns from this kind of data.

Now it’s our job to make sure that we don’t accidentally create SkyNet in the process…