NetBase Offers Powerful Semantic Indexing Platform That Reads The Web

Regular search engines such as Google and Yahoo use statistics to make sense of the Web. They count links, keywords, and other items on a page to determine its rank in search results. Semantic search engines try to actually understand the meaning of the words found on the Web and other documents to bring back the most relevant results to a query. Microsoft bought Powerset for $100 million to gain semantic search expertise, but so far all it can search is Wikipedia.. Hakia, Textwise, and other startups are also working on semantic search. Now comes NetBase, which brings a slightly different approach that its says can scale to the entire Web.

NetBase has been around for a while. Originally called Accelovation, it has raised $9 million in two rounds of venture funding over the past four years, has 30 employees, and counts among its current customers P&G, Caterpillar, 3M, BP, Kraft, BASF, and Goodyear. It is now changing its name and offering its core semantic indexing technology as a platform for other companies to build their own products. Already, scientific publisher Elsevier uses NetBase to power its Illumin8 research tool for searching scientific articles, patents, and Websites.

NetBase takes a sophisticated linguistic approach, actually diagramming sentences to determine the relationship between words and phrases. It does particularly well with causal relationships, allowing it to tease out cause and effect from raw text. For instance, in the sentence, “The calcium, potassium and magnesium found in yogurt can help reduce your risk for hypertension often resulting from stress, obesity, and other factors” NetBase can identify that “stress” and “obesity” are causes of hypertension and that “calcium,” “potassium,” “magnesium,” and “yogurt” can be used to counter hypertension.

The company has already indexed about 8 billion Web pages and processes 100 billion sentences a month through its semantic parsing. Once it identifies causes, effects, and other relationships, it can serve them up in search results along with top-ranked links. For instance, a health-related search could turn up a guide that includes related symptoms, causes, drugs, and treatments. The technology also lends itself to Q&A types of searches. You could ask, “What companies are developing semantic search technologies?” and it will return a list of companies along with the snippets of mention that company and semantic search.

I’ve tried a few demo searches set up to do various things such as provide the pros and cons of a product, the companies in a particular market, or causes and effects of a medical problem. The results were impressive. On the whole, I’d say they were at least 70 percent relevant, compared to the much larger proportion of irrelevant links I get when I do a Google search. But it was slow. NetBase took 5 seconds or more to return results, something it says won’t be as big an issue in a production versions of its technology.

NetBase is not building its own search engine, although it plans to create a health-related search engine around PubMed content as a proof of concept Instead, it is targeting large publishers and companies that want to create their won vertical search tools, which combine data on the Web with their own databases of content. This is definitely an enterprise play. Licensing starts at about $100,000 and goes up from there.