Next Year's Headline: Microsoft fails to do anything significant with Powerset

There is one glaring detail that everyone who reported on the Microsoft-Powerset acquisition has failed to mention: Powerset is a Unix based company. All of its core components – its applications, its packaging system, and its continuous integration system – were designed and written to run on a Unix platform.

As a large software company, Microsoft is a big supporter of its own craftsmanship. Unsurprisingly, all of Microsoft’s products, including Live search, run on Windows servers. Thus we come to the source of the Microsoft-Powerset repugnance. While not impossible to port software from Unix to Windows, but there is no denying that it is a hellish task. Powerset has developed a large repository of code over the last three years and naturally it has opted to utilize many open source projects in its development stack which all would have to be ported as well. A large chunk of the software that it uses is already cross platform, but those that are not present a real problem.

When Powerset launched, developer Kevin Clark gave a nice shout-out to the open source technologies that Powerset relies on. The state of each of those technology’s compatibility with Windows is given below.

Windows Compatibility of the open source technologies Powerset uses

Hadoop/Hbase Requires Cgywin to run on Windows.
Ruby Runs slow on Windows, good opportunity for Microsoft to put IronRuby to use?
Ruby on Rails Will run anywhere Ruby will run. IronRuby reached the Rails singularity recently.
Merb Same situation as Rails. No word yet on IronRuby capability.
God Does not run on Windows period. Nor is Windows support planned. However, the good samaritan who wrote God happens to work for Powerset. Maybe they could beg him to make a Windows version.
Mongrel Fully supported on Windows.
Mootools Javascript- their battle is in the browser not the OS.
Erlang Good Windows support!
YAWS Erlang based, also works on Windows.
Memcached There is an unofficial port for Windows that should not be used in production environments.

To sum it up, anything requiring Cgywin is not going to be capable of operating in a high traffic production environment like Live.com. The official Ruby interpreter on Windows is too slow and IronRuby is too new and not sufficiently field tested to be placed behind Microsoft’s search in any capacity. Using God is not an option, so they are going to have to find some other way to manage their processes on Windows. Replacing Memcached will not be difficult. Finally, Erlang appears to be in good standing on Windows but the transition will still be rough.

Getting Ruby into an optimal state is important if Microsoft wants to leverage Powerset’s existing IP. According to a interview on the Ruby on Rails podcast, “all parts of the organization use Ruby”. Including all of their development of natural language parsing and at least 60% of their internal tools.

The most likely scenario is that Microsoft will ditch the majority of Powerset’s code and task the team with porting the core technology and implementing the necessary tie-ins to the Windows Live search – an assignment that may be taken on resentfully by the Powerset engineers depending on which platform they feel is superior. Powerset’s core is realistically the semantic search technology it picked up from PARC and have expanded upon. In other words, Microsoft really wants the PARC technology but it bought Powerset “for their people” because Powerset’s engineers are the only ones who really understand the PARC stuff.

The claim by Microsoft that aspects of Powerset’s semantic search will show up in Live search by the end of the year is balderdash that Microsoft is broadcasting to its shareholders to help justify their 100 million dollar purchase. Switching onto the Windows platform is no small task and switching to a unfamiliar closed platform magnifies the complexity. However, it is rumored that Microsoft runs some BSD servers internally, in which case the Powerset integration might actually be possible. In the end, this acquisition feels like a R&D experiment for Microsoft. Semantic search is still a gray area of Computer Science and Microsoft is taking a chance at winning the lottery by picking up Powerset.