Google's Wizard Of Oz Search Algorithm And The Threat Of Facebook Search

Google search is powered by algorithms. Computers slice and dice data looking for signals that a web page is more or less interesting than other web pages for a given query. PageRank is a big part of this, where Google looks at inbound links to a site as well as the text relevant to that link. But Google also uses lots of other signals to determine the relevance of a web page. They have to, because PageRank on its own is infinitely gameable.

If no one ever tried to game search results PageRank would work just fine.

Inbound links are simply votes for various web pages. If you take the authority of the site linking into account, it makes for really good search results. That’s why Google was so great in 1999, when there was less incentive to game search results, and less expertise by the people doing it.

But today all that’s changed. There’s a feeling that Google’s algorithm is falling further and further behind the very motivated people and companies out there fighting that algorithm. It’s an arms race, and Google is losing that arms race.

Today we saw yet another algorithm change by Google, designed to fight some of the more annoying internet polluters – content farms and scrapers. The arms race continues.

No Humans Involved!

What fascinated me most today was Google’s insistence that they are not directly using the block data they crowdsource from their Chrome extension in determining search relevance.

It’s worth noting that this update does not rely on the feedback we’ve received from the Personal Blocklist Chrome extension, which we launched last week.

But then they talk about how the algorithm is coming up with very similar decisions anyway:

However, we did compare the Blocklist data we gathered with the sites identified by our algorithm, and we were very pleased that the preferences our users expressed by using the extension are well represented. If you take the top several dozen or so most-blocked domains from the Chrome extension, then this algorithmic change addresses 84% of them, which is strong independent confirmation of the user benefits.

The more I think about this, the more strange it seems to me.

There’s a good explanation for not relying on that data – if they publicly said they did there would then be a huge incentive for SEOists to start to manipulate that block data, too. Forget linkfarms, just hire thousands of people on Mechanical turk to download the extension and block competitor’s sites. Another angle on the arms race.

But I don’t think that’s why. Like the Wizard of Oz, Google hides behind their mighty and mysterious search algorithm. If good search was as easy as analyzing simple clicks of a mouse on a web page, all the magic could vaporize.

And if you could somehow remove as much spam as possible from that data, and even slice it demographically, geographically and even personally for a given user, then things might really get sticky.

Particularly if Google didn’t have access to any of that data.

And Facebook did.

One of the most interesting experiments going on in search right now is Blekko’s Facebook Like powered search engine. Search results and search relevance is determined by what your friends have “liked” on Facebook, a very deep store of data indeed.

Facebook has more than half a billion users, and half of those log on every day. These people spend 700 billion minutes on the site and share 30 billion pieces of content. Links are being shared and people are clicking “like” to vote for that content. And it turns out that it all adds up to a pretty useful search engine experiment on Blekko.

Imagine what Google could do with all that data and you start to understand why social is so darned important for them right now. Not to kill Facebook, but to try to neutralize the threat that the next great leap in search engine evolution doesn’t happen completely without them. A lot of the searches that Google is really bad at – commerce and travel, for example – can get really good really fast if you can look at deep data from friends about those very things. I don’t need pages and pages of results. Just a nice hotel in Paris that a friend vouches for. Or a movie I’ll enjoy. Or the right set of pots and pans. All that data is right there on Facebook.

It may take Facebook a few years to really start to get interested in search. But there is so much advertising revenue in that business that they can’t ignore it forever. And that must scare Google more than just about anything else.