Google Presents Code Search

codesearch_logo_sm.gif

Google today launched Code Search, a search engine and index of source code that is collected from publicly available sources. Google claims that the new code search engine will be able to find almost any code that its crawler can find, but in a few specific searches it failed to locate some code that I had hosted on my own server – but this is sure to improve. It does seem that the Google index of source code is a lot broader than those found at competing sites Krugle and Koders. For instance, Google Code Search will index the content of zip and tarball files on open source sites such as openssl.org, while the other search sites seem to return a lot of results from sourceforge and a few other centralized repositories.

The first thing you notice at Google Code Search is that you can use regular expressions in the query field when searching, and there are a lot of search options to help you further refine what you are looking for. On the front page of Google Code Search there is a nice overview with some pointers on using the service.

To test Google Code Search out against both Krugle and Koders, I ran a search for “md5 in C”, hoping to find an implementation of the MD5 hash algorithm in C. In Google, I can specify the implementation language I would like in the search query, while in both Krugle and Koders I needed to select the language from a drop down. Krugle and Koders didn’t seem to filter the results based on language too well as they both had results that were implementations in other languages. One problem here is that the search engines don’t actually know you are looking for a simple implementation of md5, they are just string-matching against their indexes so you get some very poor results (such as functions that call an MD5 library). Across the 3 search engines, I could not find a good, pure MD5 implementation – just a lot of header files and functions that had the string ‘md5’ within them.

All of these search engines have a long way to go before they become a shortcut way for developers to find code – especially considering that most developers are astute at using ordinary search engines to find what they are looking for. Searching for a phrase like “drop-down menu in ajax” won’t return anything usefull, so developers who don’t know which specific string within code they are looking for will have a hard time. Track record would suggest that Google are the company to most likely get this right, by combining the information they have in their main search engine with the source code data for better results (for example, I can see them indexing code examples from MSDN rather easily). This looks like bad news for the startups in this space who will need to further innovate, but it is good news for Google, a company that hasn’t really been hitting home runs recently with some of it’s recent new products.