Harvard Hawking Online Cialis? Welcome to the New Era of “Web Snatchers”

Editor’s Note: The following is a guest post from Michael Markson, VP of Marketing for internet search engine Blekko, Inc. Previously, Markson was founder and VP of Business Development of news site Topix.

In the 1956 classic horror film Invasion of the Body Snatchers, the citizens of the fictional town of Santa Mira, California are replaced by perfect physical duplicates, simulations grown from plantlike pods. The goal of these pod people is to work together in order to replace the entire human race.  You may remember the scene when the star stares at the camera and yells: “They’re here already! You’re next!”

Fast forward 54 years to the current state of the Web and guess what? They’re here and you’re next. I’m talking about Web spam and, unfortunately, no one is safe. We found this week that even a venerable institution like Harvard University, which I’d bet invests a great deal in securing its website and protecting its reputation online, can be a victim of the Website Snatchers.

Here’s an example (click through to Harvard’s site):

First, you see the site’s header, which trumpets the Belfer Center for Science at Harvard University. But the headline on the page, “buy cialis online cheap” quickly alerts you that this ain’t your typical page of medical info. After a quick scan of the copy on this page the purpose of the hijack is clear: 150 of the 850 words on the page are devoted to mentions of “generic Cialis” and “Viagra”.

Why snatch this site? Well, as most of you probably know, a site’s ranking in a Web search is determined not only by its relevance to the search query but also by the number and quality of other sites linking to it. Harvard is a very tasty catch for a Cialis spammer, and no doubt this hijacking improved the ranking of the site the spammers were promoting.

Just to ensure maximum value, the hacker has even inserted a table that points to even more links in an attempt to boost the ranking in results for the term “Cialis”. Links point to terms like “cost of viagra 50mg,” “viagra best prices,” and finally “cialis Thailand.”

Of course, this is only one form of Web snatching.  Here’s another, even more insidious form: imagine you are having medical problems and you go online to do some research. You start by visiting a major search engine and doing a query or two about your condition.  Quickly, you find a result with a title that perfectly matches your query. You click on it and read the content, presuming that since the search engine served it as a top result it must be trustworthy.

Unfortunately, a closer look reveals that the source of this article isn’t the Mayo Clinic or WebMD.  In fact, it’s not even written by a doctor. It’s an article produced by ehow.com or ezinearticles.com (or some other content farm), written by a wholly unqualified author who was paid by the word. Just like that, your attention was snatched from an authoritative site (mayoclinic.com) and sent to the website of the imposter. Ugh.

How does this happen? We’re 15 years into the Web now. Aren’t the algorithms smarter than this?  Short answer: no.  Behind the scenes of every search there’s actually a war going on. On one side are algorithms designed to cull information from relevant websites and deliver it to you on a query-by-query basis. On the other, imposter sites created solely for profit and designed entirely to get in the way of what you are searching for. Unfortunately, in many cases the bad guys are winning.

As this problem grows, it becomes a real threat to the Web.  And trust me, it’s not about to get better. I mentioned eHow.com previously. eHow is owned by a company called Demand Media whose sole business is to create these type of imposter pages. Demand Media just filed for their IPO. With that type of reward out there, do you think more or less of these imposters will be created every day?

So, how do we solve this?  First step: stop relying wholly on algorithms. The sober truth is that smarter algorithms can’t save us. There will always be huge economic incentives for hackers to build smarter spam machines. Fooling machines has always been relatively easy. Fooling humans, not so much.

At blekko, we’re taking this threat seriously. Unlike the other search engines, we want actual people involved in our ranking. We’ve introduced a feature called ‘slashtags’ that allows you to cut out spam sites and search only the sites you want. A small group of users is currently testing this functionality in a private beta. (If you’d like to be included in the beta, send us an email at techcrunch@blekko.com or follow us on Twitter – @ blekko.)

One of the benefits of slashtags is that if you find a site like eHow in your results, you can instantly tag it as spam. Once tagged, it will never show up in your blekko search results again. As we like to say, that site is now dead to you.

Soon we’re going to be adding new social tools so that our users can play an even more active role in separating the wheat from the chaff. We could never build a machine that can perfectly analyze the quality of a site.  But we know that people can do the job easily and instantly. Our job is to provide a platform where people can share that information in a way where everyone benefits.

In Invasion of the Body Snatchers, the star was told that there was no actual problem; rather the aberrant behavior he observed was an epidemic of mass hysteria.  However, he didn’t listen. With respect to the Web today, neither should you.  As Harvard can attest, the Web Snatchers are here already, and you could be next.