Microsoft intends to use Powerset’s natural language search technology as a major differentiating factor v. no. 1 search player Google (see our recent coverage of Live Search Cashback, a another Microsoft search effort aimed at getting more market share).
TechCrunchIT goes into detail on how effective Powerset may be as a weapon. But a few things are clear – the resource limitations (cash and computing resources) that slowed Powerset’s development are now history. The relevance problem is less important since Microsoft core search relevance is quite good. And users really seem to like the beta launch of Powerset even with the limited dataset.
Naam says 5% of searches contain elements of natural language that keyword based search algorithms don’t handle well, and there’s an assumption that as better results are returned, more people may start to simply type a normal sentence instead of a couple of keywords. Microsoft will integrate at least parts of Powerset technology into Microsoft Live Search by the end of the year, Naam says. I expect we’ll be hearing a lot more about natural language search coming out of Microsoft shortly.
The full interview transcript is below, and you can listen to the MP3 over at TalkCrunch.
Michael Arrington: Hello this is Mike Arrington with Techcrunch. I have on the line today Barney Pell the co-founder and CEO of Powerset which was acquired, or actually announced that it was going to be acquired by Microsoft, earlier today the ancones was made. And from Microsoft I have Ramez Naam on the phone as well. He’s the group program manager for Microsoft Live search. Welcome Guys.
Ramez Naam: Thank you.
Barney Pell: Thank you.
MA: So just to be clear, what exactly was announced today? You announced that you’ve signed a deal, but not closed it yet.
RN: That’s right, we’ve signed the deal but the transaction has not happened but we’ve agreed on all terms for Powerset to become a part of the Live search team at Microsoft.
MA. Is there anything that can happen can stop the closing at this point?
RN: It would be very, very unexpected for anything to stop the closing at this point.
MA: Okay, and how long do you think it would take to close the deal, or more importantly really integrating the teams and move the product forward using Microsoft’s resources.
RN: Well, closing the deal, it’ll take a typical amount of time and not too terribly long, as far as integrating the teams, I think we’ll start on that immediately. And this is both a short term and long term task: Short term we think that Powerset has an amazing team, great people like Tim Converse, Chad Walters, Lorenzo Thione, Scott Prevost, Barney and they’ll have a big impact on Live Search before the end of the year your going to see significant changes and then long term as Barney likes to say, this is a 20 year vision, really the understanding what the pages are about, what queries mean. This is the front line of artificial intelligence, computer science and we’re going to be working on this for quite a long time to come.
MA: Ramez, did you say that your going to be integrating the teams right away?
RN: Yea, essentially yes.
MA: Ok, so effectively the deals close, that means that the deals closed, from an outsiders perspective, you guys are one company and you’re moving forward right now.
RN: We’re certainly laying our plans right now and talk about what we’re going to start doing. You won’t see any impact until the deal actually closes, but we have a a lot of ideas and a lot of conversations.
MA: Barney I’d like to, before we jump in too much more, I’d like for you to give a little bit of a background of Powerset, just a couple of minutes on when you first had the idea, the early days when you founded the company and just a couple minutes on that, and the basic ideas you had that resulted in you founding Powerset.
BP: Ok great. 3 years ago I was an Entrepreneur in residence at Mayfield, a venture capital firm. And I was looking at what was going to be the future of search, projecting forward in an open ended, visionary way and looking at what were the major trends. I had a previous background in a lot of artificial intelligence, and taking advanced AI technology from research labs, and getting them into the world. Either through Mission Critical, or mission operations at NASA, spacecrafts when I was at NASA, or internet search related technologies among other things.
I could see that there was going to be a huge amount of computing power becoming available over time, and that a lot of the work in AI and in particular natural language, was sort of nearing the time where it was going to be commercially ready, and these two trends would be converging just as search was becoming the center of our interactions with computers and tapping into all the information that is out there on the internet. So, I could really see that there were a set of trends that were going to converge it looked like the center of a perfect storm, I then having seen this vision, went out to evaluate how good was this natural language technology across the different groups, the research organizations, identified the key requirements that would make it work at scale, what kind of properties would a natural language search have to have to work at scale, how would the economics work now, and was now really the right time. And in my assessment of the different technologies out there, I found that the technology at PARC, after 30 years of development, had come to a point where it was actually ready to be taken out, and to be commercialized. And in principal ought to be able to work at scale. I began negotiating with PARC, with Ron Kaplan, who was leading the natural language group there for 30 years, and also with his colleague Danny. These were sort of the fundamental guys in computational linguistics over all this time.
And in parallel I found that there was another group that was related, working with the same PARC technology, already looking to apply it to search, in a research basis, and these were the folks at Fuji-Xerox Palo Alto labs including, Lorenzo Thione, and two of our other key people inside Powerset. And they had a similar vision on a research side, and they were already looking at the PARC technology and they were saying that this should be able to work. So, we actually had a shared vision, a shared recognition across all of us, that this could be possible. And Lorenzo would up saying right away, “let’s go do this, you know I want to join you”, and became a co-founder.
We then spent a long time negotiating with PARC to set up the right kind of teaming and partnership relationship, and to ultimately develop a very powerful license for this technology that would work for everyone. And during that time, we wound up, we were stealth for a while, and we would up hiring and building a great team, raising several rounds of funding, and basically building our product. And we had to build a lot of challenging infrastructure, take this natural language technology from the research labs, and really making it work on a large scale. Bringing together a world class search team, with together people like Chad Walters and Tim Converse coming on board, bringing their expertise and figure out how to make this stuff work on scale with natural language and the best of search, and then build up a product and user experience team that would be able to make this work in a way users would understand and be able to see the differentiation and like it.
Ultimately, after two years after we hired the first employees, we launched our product a couple months ago, demonstrating this capability on Wikipedia, and the response, you know we talked about this mike, the response that we had is that people generally really like the system and they just want to have it on the whole web. So I guess that’s a basic tour through the history of the company it’s only been 2 years since we hired any employees at all, and now the company is 63 people.
MA: How many of those 63 are search engineers and scientists?
BP: Most of them.
MA: Okay, how much money have you raised?
BP: We reported our series A round which we raised 12.5 million and that was including our angel bridge round. We actually didn’t report anything after that; clearly we did raise some more money, but we you know, we didn’t report anything.
MA: And what was the acquisition price?
BP: We’re not discussing that?
MA: Really, how about a ballpark? Everyone said 100 million is that where it ended up?
BP: We’re not even discussing ballparks, we’re not talking about it all together.
MA: That’s a great background of the company, but let’s talk a little bit about what you actually do that’s different in terms of thinking about Yahoo! Or Google search or search that Microsoft has today that’s keyword based and I’d love to go back to a post you did probably over a year ago now, maybe a year and a half ago, I think on your personal blog, where you talked about Powerset for the first time, and what you were trying to accomplish. From a non technical standpoint, what’s your vision for helping users search?
RN: One way to think about it is today’s systems that are out there, they don’t really understand language, so they don’t understand what a user is really saying, and the intent that’s behind the users query and they also don’t understand the documents that they are reading that they are ultimately trying to let the users find and by the way, they don’t understand the ads. So they don’t really understand anything and their based largely on statistical properties. Does this particular stream of characters appear with the right frequency in the right locations on certain pages? Does it all match? And it kind of does it pretty good job for being such a basic approach.
Now if you think about, what could you do if you had a system that could understand language? What if I could read? What if it’s already read everything in the document collection you’re interested in? Whether that’s a smallish collection like Wikipedia, or whether that’s potentially the whole Web? How could that actually help you? Well it could help in many ways. One, is you could just use more natural queries, just stating your intent as you actually mean it. Where that’s a full sentence or a question, or just a little bit of a linguistic phrase, or just some persons name. But it could understand that better and it could figure out what you want to do with this and how can I help. And then on the content side, if it could really read, then it could do a much better job matching the meaning of your query to the actual meaning that’s there in the documents. Moreover, it could present for you the results, you often have a challenge when you’re looking at search results of you see a little bit of a snippet kind of two lines worth of characters and you have to figure out from that, is that what I actually wanted? Because the system we have today don’t actually understand the queries and they actually don’t understand the documents, all they can really show you is where the keywords you asked are matched approximately in the right regions. But if they actually could understand both the documents and your query then they could present results, first of all, better two lines, or potentially a whole new kind of presentation.
MA: Just to cut in for one second the way you have described this before I have heard you talk about this is Google and other search engines look for key word batches and then present results ranked according to some sort of algorithm that determines how important a page is. You’ve said before that what Powerset does is it pre-reads the content. It uses artificial intelligence to actually try to understand what sentences mean and in the live search blog post today, the Microsoft announcement effectively of the deal they talked about a couple of examples that you know, a shrub and a tree are similar concepts that was one example, or that the word cancer could mean a disease or a horoscope. How does… Ramez maybe you want to jump in here to. How does that actually happen and what… a computer receives a sentence, your server sees a sentence, how does it actually start to parse that, again as non-technical as you can describe it.
BP: Okay, I’ll take that and Ramez, you can jump in on your examples.
BP: I guess one way to think about it is like when you are learning how to diagram sentences in elementary school. You draw these trees of a sentence and find here is the noun phrase and a noun phrase has a determiner like “the” and then it has a noun like “dog” and here is a verb phrase, and it might have a verb like “barks” and then what does it mean for the that word, bark is a verb and it has a “S” at the end and the way that it works, which we call morphology, that’s the present tense of that verb. And then the whole sentence is composed of those pieces, and so the meaning is built out of those. So you draw these diagrams when you are learning how to do it. And the kind of knowledge that’s in a natural language processing system like Powerset is using is sort of like that. Its basically extracting out both the surface structure, that kind of a tree structure of a sentence, and then its converting that into a series of different representation, ultimately into one which expressing that thing in fact. So it will basically say that there is a kind of activity here and it is a barking activity and the thing that is doing that activity, the subject of that activity, is a dog. Ok. So it is going from that sort of a surface structure of the language that you are seeing and converting it into a semantic factor representation. In addition, it is then able to draw on the individual meaning and relationships between words so if you saw that the sentence said “The poodle barks.” Then the system knows, if it can draw upon other knowledge about the relationship between words, as Powerset does, that poodles are a kind of dog. So if you as the user were able to say, “I want dogs barking” then it can actually then match the concept of dog to the concept of poodle and it is matching barking to barking and it is then doing this sort of semantic match for you which uses words you are not even using in your query and matching those against the document.
RN: I think everything that Barney said was right on. I think you see search engines including Live Search and also Google and Yahoo are starting to do more work on this matching not exactly what the user entered but it is usually limited to very simple things. So now all of us do some expansion of abbreviations or expansion of acronyms. If you type “NYC” in a search engine these days, in the last couple years, it understands that it means the same thing as New York. These are very very simple rules based things, and no one understands that bark has one meaning if it about a tree and a different meaning if it is about a dog. Or an example that someone gave the other day was the question of “was so and so framed.” And framed could mean a framed picture or it could mean set-up for criminal activity that did not occur, and so on. And you have to actually understand something of it is a person’s name then it applies to one sense of the word framed if it is not then it doesn’t. So one of the things that Powerset brings that is unique is the ability to apply their search technology to the query to the user’s search in ways that are beyond just the simple pluralization or adding an “-ing” is that Powerset also looks at the document, it looks at the words that are on a web page and this is actually very important. If you look at just the users query, what you have available to you to figure what they are talking about are three words four words five words, maybe even less. That can give you certain hints. If you look at a web page that has hundreds or thousands of words on it you have a lot more information you can use if you understand it linguistically to tell what its about, what kind of quieries it should match and what kind of quieries it shouldn’t match. And Powerset is fairly unique in applying this technology in the index on a fairly large scale already and with Microsoft’s investment and long term commitment we can scale this out even further, an apply it even more of the web, not just the wikipedia content they have thus far.
MA: Ramez how much work has Microsoft done in this area before today? Is it something that has been simmering, that you guys have been interested in, do you have a number of people on staff that are experts in this area, that have built technology around this? it would be interesting to know what you have done to date in this area.
RN: Well Microsoft has some leading people in natural language processing. We have applied the idea in machine translating, translating from one language to another, and in other areas of natural lanaguage, even things like the grammar checker in Microsoft Word comes out of our natural language work in some ways, and that is very exciting. The thing about the Powerset team is that it is purely additive, like the people inside of Microsoft research I have talked to about this are extremely excited. They see the Powerset team in San Francisco as great collaborators and see this as a great chance to exchange data, ideas, tools, and so on. All of this is going to help us directly. Also this is the first time we have had a focused team working just on natural language applied to search specificially, and not a broader area. With this kind of focused effort and the great technology that the Powerset team has built we’ll be able to make really rapid progress.
MA: Where are your search engineers today? Are they in Washington, or in your Mountain View office?
RN: The bulk of our team is in Redmond, and we have a small team that is in Mountain View, as well.
MA: For now is Powerset staying in their San Francisco offices?
RN: Powerset is absolutely staying in San Francisco. They have a fantastic office. I plan on staying down there a couple days a week myself. It is a fantastic location, and we want to grow the team so we are looking for more and more qualified search engineers and more and more computational linguists to join the team at Powerset, and keep scaling up.
MA: Barney, how many of you’re current employees, how many of your employees previously worked at Microsoft? Did anybody get hired back after leaving Microsoft?
BP: Actually I have not counted. I think we have a few Microsoft people, but it is not a high proportion.
MA: One of the things that has obviously hindered Powerset is that you need to index the entire web in a different way than search engines index them today because as you say your reading web pages instead of just noting key words and publicly you have said that you are not prepared to do that yet because it costs money and you wanted to prove it out with the beta product that looks at Wikipedia first. Beyond the fact that it is more expensive to index the web that way, that’s obviously, expense is not as much of an issue now that you are part of Microsoft, how long will it take. If you turned on the gas now full blast and wanted to launch a full version of Powerset that indexed the web, what is the fastest we could expect to see it.
BP: Umm, we are just getting together as a team to look technical integration and look at the best ways for our teams to work together and how we are going to combine and really leverage the resources that Microsoft has, so it is early a little early to say how long it is going to take before you see it. What I can say…
MA: Barney you have become media trained.
MA: (Mocking) We are Microsoft. We cannot comment on future product releases. You gave me thirty seconds of nothing.
BP: No, No I prefaced it (laughter). I am not finished yet. With all that said about what I can or can’t actually say, what I think I can say is that Powerset has already been doing some experiments processing web pages. Arbitrary, random web pages using our technology, and those results are looking pretty good. It is already a pretty parallel system, so to some extent the basic experience you see right now could be replicated just by running the larger set of content that Microsoft already has using our technology running on the machines that Microsoft already has. Now that doesn’t mean that you would get the full search experience because there’s all the rest of the features that Microsoft has developed that we would want to integrate together to give a really coherent and good search experience. But some of the things you see already like the facts that Powerset extracts from the documents, to building profiles automatically of any kind of concepts that you have and the ability to show the pages with their automatically generated summaries. A lot of those features could really be done, at least to some level of quality today just by running it on a Microsoft infrastructure with resources that exist today. So we are going to have to figure out on what order are we developing what, but we feel that fundamentally the challenges of getting this up to web scale, the main barriers that were in our way, with Microsoft are now removed.
RN: A.) Barney is really showing his media training here, I am really impressed. His answer is also spot on, and something to bear in mind is, at this point, it has been primarily senior people across the teams that have been talking. And we really do have a very bottoms up culture inside of search. I think Powerset does as well. So we are going to connect more and more engineering teams now that we have announced this and we can start working on detailed plans. What we have super high confidence in is that this is a great fit, with great people. The cultures are actually very similar, and this is right on strategy with what we see as the big barriers to customers getting high quality results. You are going to see some short term stuff. We are going to get some stuff out there that is available to you on the live search site before the end of this year for sure. And then we are going to, as Barney was saying, take the current technology and start to scale it out out out. And will we go straight from wikipedia to the entire web? Will we have some interim stuff? I am not sure yet. But we will start scaling it up, and getting more and more benefit for customers over time.
MA: So do you think that you will launch this technology on live search, or will you launch something on Powerset, and sort of keep the brands separate for a while? Or are you ditching the Powerset brand? Have you thought about that yet?
RN: We are going to keep Powerset alive, we think it is a fantastic technology showcase, and we will probably always have some things that are really interesting to play and show people, but that aren’t quite ready yet to be exposed to all of our customers. But what really is the payoff for us is integrating the Powerset technology deep within live search, and really making that product the one that really shines, in addition to Powerset. We want to take Powerset’s technology and really broaden it out and impact tens of millions of people, if not hundreds of millions of people with the benefits of what Powerset brings.
MA: In December of last year, Peter Norwig, head of research at Google, was interviewed, and he said some things about natural language search that were interesting, and I’ll link to this when we post the podcast, but he said that, I will quote him. I would love to get your guys’ reaction out of this on just a product and science level. “We don’t think it’s a big advance to be able to pose something as a question as opposed to keywords. Typing what is the capital of France won’t get you better results than capital of France.” To me that doesn’t really respond at all to what Powerset is promising to do, and what it is already doing with wikipedia. But then he went on to talk about the limiting value, in his opinion, of natural language search. He said, “We think that what’s important about natural language search is the mapping of words to concepts that users are looking for.” He gives some examples: New York is different from York, but Vegas is the same as Las Vegas, and Jersey may or may not be the same as New Jersey. That is a natural language aspect that we are focusing on. Most of what we do is at the word and phrase level. We are not concentrating on the sentence. We think its important to get the right results rather than change the interface. What is your response to that?
RN: I think what Peter Norwig is saying has some degree of accuracy and that he is also ignoring some things. So, just for normal queries, queries that are not phrased as questions, there is a lot of linguistic structure. If someone types in a query that is “2 bedroom apartments, under 1000 dollars, within a mile of Portero Hill.” That query is loaded with linguistic content. And that’s a realistic query. That is the type of thing that customers actually want to find on the web. Today there is a sort of helplessness, where customers know that certain queries are too complicated, and they wont even issure them to a search engine. They will go to some deep vertical search engine where they can enter different data into different boxes. What is the capital of France vs. capital of France; that is not really an area that is that interesting. But some of these more complex queries really are. For example, shrub vs. tree. If I do a search for decorative shrubs for my yard, and the ideal web page has small decorative trees for my garden, it really should have matched that page and brought it up as a good result. But today Google won’t do it, Yahoo won’t do it, and Live won’t do it. So even in these normal queries there is a lot of value in the linguistics.
BP: That’s right. So in addition, Powerset just launched the product, and I think that some of the features are really well called out in the iPhone product that we just launched. It’s just another version of our web site, but designed to be used on an iphone. And Mike you have sort of blogged about it. I have been using Powerset on a mobile device ever since we launched, and it’s kind of funny because you have a very limited real estate, and you know what you want in your head but you know it is going to take a long time for the pages to come up. I see a movie, Iron Man, and I wanna know, what other movies did Jeff Bridges star in? How do you want to ask that question? How do you want to get the information? You want to say what movies has Jeff Bridges starred in? Who was that blonde reporter in Iron Man? How are you supposed to ask that? All these things that we think in our head in language, and then we have to figure out how to translate it. It doesn’t mean that you should have to do more typing to get back worse results, but it means that you should be able to do anything in the most natural method possible. We are humans, language is our unique human endowment, yet we have not been able to take advantage of that when interacting with machines.
MA: Wait, wait, wait. So I have been an internet user for 13 years now roughly, and I know better than to type a sentence into a search bar. What I would do is…
BP: That’s the learned helplessness.
MA: Yes, but what I would to is type in iron man, and look up the name of the blonde reporter from there. I have learned to do that because I have been using the internet for so long. Do you think that anyone still searches that way anymore with long sentences? It seems we tried in the early days and realized it didn’t work. So, does anyone even bother searching that way? And a follow up question would be, Barney, with regards what you are seeing in the Wikipedia engine, are you seeing longer queries sort of slowly developing as people learn to speak to a search engine?
BP: Let me answer the question of does anybody actually search this way. The answer is yes, people do this. It isn’t the most common mode, but we do see that probably 5% of queries are natural language queries. These are not all queries that are phrased in complete sentences, but they are queries where the customer has issued something that has some sort of linguistic structure. Almost any query with a preposition: X and Y, A near B, attribute A of Y, etc. Those things are loaded with linguistic structure.
BP: So there’s a couple pieces. One was does anybody do things? I think we all have the experience – if you just get your most basic expression query and your system comes back with a result that’s good enough you’re done and you’re happy. Well, what is it that happens when you don’t get back the result the first time? You have that moment of frustration and you know you’re in for a project. What happens is that moment of prayer, where you’ve basically tried a few different versions and you’re just frustrated and how do you express your query? You express your prayer and you say just let me say what I want and I know I’m not going to get results, but darn it, I’m just going to poke.
MA: I think that’s why so often Yahoo Answers pops up, because they have those questions that are literally a quotation of the question. Somebody else has asked and answered it, but that may not be the best resource for the answer, but it’s the best place to the search engine can find to send me to.
RN: I have a list of some natural language queries in front of me. Can we just show you some queries that our customers have actually sent to us and are random examples. The first person to see the dark side of the moon. How to get a credit card in Malaysia. Enabling system restore in group policy on domain controller. Timeline of Nvidia. How to measure for draperies. What is the difference between Mrs. and women’s sizes? Does my baby have acid reflux? I could just go on and on and I. These fit in the category that we’ve labeled that match about five percent of queries and they’re really just cases where the customer can’t think of a simpler way to express it.
BP: Now I’m going to elaborate Mike on you’re second part of you’re question, which was Powerset launched and have we seen that users are actually doing anything regarding natural language and if the queries look at all different.
BP: And the answer is absolutely yes. Our users have had absolutely no problem at all in throwing longer, more interesting, more complex at the system. You know, it’s just a flood of them and so when we watched the initial queries come in at launch, it was kind of a fun moment for us because it was some sense of initial reputation. There was no issue about could users use English or use ways of expressing themselves in all of their daily lives, could they actually manage to do that with a search engine if given the chance. Absolutely, if users are given the chance, the users do and users will. I want to go back about another point though. We don’t want to harbor all on the query side and expression of intent, because all of these billions of documents you’ll look at are all loaded with language. So the ability to read them in advance and extract the key information and then use that, even if you just did a small little simple query by automatically generating a profile. As for example Henry VIII I think you’ve blogged about Mike. Or, when you’re reading an article, giving you the summaries of the article.
MA: You return answers, not web pages sometimes and that’s amazing.
BP: We return answers. We actually synthesize, so if you were to say, “What did Tom Cruise star in,” you actually get not just the movies, but the cover art for the different movies. It synthesizes multiple pieces of information to give you a whole different kind of presentation. Or, if you were just to say, “Bill Gates” you’d be given an automatically generated profile of Bill Gates, pulled across many, many articles. It’s no longer just about 10 links, although we can certainly do more relevant job (and will) of the blue links, and a better job of presenting those links. With the language understanding systems which we now have, we can go way beyond that and open up a whole new door in user experience until you think, “oh god, that’s how I used to search, now I want this whole new different kind of thing.” And now the question is, which are users are asking, is how do I get this on the whole web and with this partnership we’re now going to deliver.
MA: Ok, I’m out of questions. This was really helpful. There’s a million other things that I’d love to ask, but you’re not going to answer them yet. I look forward to seeing the Powerset technology launch with a full web index and Microsoft’s ranking technology behind it. I think it’s going to be great. Ramez are you promising a full launch, or some kind of launch by the end of the year? You mentioned that earlier in the podcast.
RN: What I’m saying is that by the end of the year you will definitely see Powerset technology improving the experience for customers on Live Search.
MA: Ok. Alright guys, thanks very much for your time and congratulations to both of you.
BP: Hey thanks Mike. Bye.