Pinecone vector database can now handle hybrid keyword-semantic search

When Pinecone announced a vector database at the beginning of last year, it was building something that was specifically designed for machine learning and aimed at data scientists. The idea was that you could query this data in a format that machines understand, making it much faster.

Originally this involved semantic searches where users could search based on meaning instead of specific words. It turns out, however, that as people put Pinecone to work, there were use cases where specific keywords mattered, and today the company announced that it’s now possible to conduct searches combining both semantic and keyword searches, what company founder and CEO Edo Liberty calls hybrid search.

“We’ve conducted a lot of research on this topic and we found that, in fact, hybrid search ends up being better [in many cases]. It’s better in the sense that if you can combine both semantic search, this is the deep NLP encoding of sentences that gets the context and the meaning and so on, but you can also infuse that with specific keywords…the combination of those two ends up being significantly better,” Liberty told TechCrunch.

In fact he says the two complement each other well, especially in cases where industry-specific terms matter. This could be something like a doctor searching for keywords related to a specific disease. In those cases, the medical context may return better results by combining a question and some specific keywords around a given disease.

He says that the keywords never take precedence over the semantic question the user is asking, but they provide some extra information to help return more meaningful results.

“You might know exactly what you’re looking for, and you might be able to provide extra oomph when you make your semantic search keyword-aware — and that actually helps a lot. So I don’t want to throw away the good parts of keyword search [by relying completely on semantic search]. I don’t want the keywords to be in the driver’s seat, but I don’t to ignore them completely either,” he said.

As Liberty told us at the time of the company’s $28 million Series A last year, search has become a big use case for the company.

“The predominant use of the vector databases is for search, and search in the broad sense of the word. It’s searching through documents, but you can think about search as information retrieval in general, discovery, recommendation, anomaly detection and so on,” he said at the time.

Pinecone launched in 2019 and has raised $38 million, per Crunchbase.