As the Web swells with more and more data, the predominant way of sifting through all of that data—keyword search—will one day break down in its ability to deliver the exact information we want at our fingertips. In fact, some argue that keyword search is already delivering diminishing returns—as the slide above by Nova Spivack implies. Spivack is the CEO and founder of semantic Web startup Radar Networks and is pushing his view that semantic search will help solve these problems. But anyone frustrated by the sense that it takes longer to find something on Google today than it did even a year ago knows there is some truth to his argument.
“Keyword search is okay,” he says, “but if the information explosion continues we need something better.” Today, there are about 1.3 billion people on the Web, and more than 100 million active Websites. As more people pile on, the amount of information on the Web keeps growing exponentially to accommodate all those seekers, and they themselves feel compelled to put their own personal and social information onto the Web as well.
At a certain point, with billions and billions of Web pages to sift through, keyword search just won’t cut it anymore. It’s a needle-in-the-haystack problem, with the haystacks just getting bigger and bigger every second.
Keyword search engines return haystacks, but what we really are looking for are the needles . The problem with keyword search such as Google’s approach is that only highly cited pages make it into the top results. You get a huge pile of results, but the page you want—the “needle” you are looking for—may not be highly cited by other pages and so it does not appear on the first page. This is because keyword search engines don’t understand your question, they just find pages that match the words in your question.
So how do we get beyond keyword search and Google’s PageRank? There are many approaches being tried: social search, tagging, guided search, natural-language search, statistical methods, open search, semantic search, and (way out there) artificial intelligence. They all have their problems. Tags are too messy and inconsistent. Natural-language requires too much computing power, is difficult to scale, and doesn’t deal with structured data well. Semantic search is perhaps the most promising, but it essentially requires every single Webpage to be re-written.
Spivack covered these issues during a presentation earlier this month at the Next Web conference in Amsterdam. It was one of the clearest explanations of the semantic Web I’ve heard so far (I’ve embedded his full slide show below). The semantic Web is nothing more than a set of standards that, if broadly adopted, would help computers extract meaning from the flood of data on the Web. But instead of a brute software approach, it puts intelligence into the data. “All you need to use that data is carried by the data itself,” says Spivack. Dumb software, smart data. That is an approach that scales no matter how many billions of Web pages are created.
The point, says Spivack, is:
To do for data what the Web did for documents.
You are turning the Web into a database, and your data becomes a part of it. Your data becomes part of the worldwide database. The semantic Web will let you move from data record to data record, just like you go from Web page to Web page.
There are many obstacles to the adoption of the semantic Web, but its goals are something worth striving for. What is certain is that search needs to evolve, and Google and Yahoo and Microsoft with it. Of course, they can adopt whichever approach or combination proves most effective.
The question is: Will they, or are they too wedded to keyword search to move beyond it?