Google Blog Search – First Impressions

Google Blog Search launched last night and hit the blogospere by storm. We’ve had a chance to bang on it for most of the night and morning and have a few things to report.

Overall, Google Blog Search is a very worthy addition to the ranks of blog search engines.

The Basics

The search is completely separate from normal Google search. It can be accessed in three ways, although the back end service is the same regardless of how you access it:

Advanced search options can be viewed here.

Blogs that use a ping server such as Weblogs.com (profile) have been indexed since June 2005, so older posts are not included in the index.

The engine generally points to posts only, although if there is a good match to your query for an entire blog, Google points to the blog above normal search results (see screen shot below).

You can used the advanced search features to restrict search to certain languages (35 supported languages).

Additional information can be found in the FAQs.

What Google Blog Search Does Well

The interface is clean. The engine is unquestionably fast. About as fast as normal Google searches. However, since their indes only goes back to June, it is unfair to compare it to existing blog search engines.

Google is indexing posts by crawling the XML feeds rather than the post html. This allows for significantly more structured data (date, author, categorites, etc.). However, if the XML feed only includes a summary of the post (as very many do), the full text of the post will not be indexed (and therefore cannot be searched). Relevant information will not be found.

Speed is a crucial issue and if they can maintain current search speeds over time, it will be a very large competitive advantage.

Search results can be sorted by date or “relevance”. Sorting by relevance is the default.

Areas to Improve

A few people are noting deficiencies in the current product. Richard MacManus says:

But… is it just me, or is Google Blog Search pretty tame/lame? I don’t think Technorati should give up its day job just yet, despite being hammered in the blogosphere lately.

David Sifry (CEO of Technorati) gives more detail, saying:

I’m sure that they’ll continue to improve over the coming months, perhaps including tags, recent images and links, zeitgeists, blogger tools, and other types of semistructured data. I’m sure that they’ll also start indexing the full-text of blog posts, not just the partial text found in most blog feeds.

Overall, significant room for improvement exists.

  • Google should improve relevance – our initial test indicate that the relevance leaves much to be desired.
  • Posts older than June must be included somehow
  • Categories and Tags should be shown
  • Images and links should be included
  • Linking posts should be shown
  • Full html of the posts should be crawled, in addition to XML feeds

Conclusions

Google Blog Search is fast and the ability to sort by relevance or date is an important features. However, Google search is not a category killer like the original Google search was. Competitors took a much expected hit today, but they are still standing.