When you face the kind of scale that Reddit does with over 300 million monthly active users generating 5 million comments and a staggering 40 million searches every day across a more than a million communities, it’s a daunting task to find a search tool to handle that kind of volume.
The challenge with Reddit extends beyond indexing these massive numbers. They also have to deal with wide variety of content with text, gifs, images and video by the score. While part of the goal was to improve traditional search functionality and deliver more relevant results, perhaps even more critically, they wanted a tool to help users surface the subjects that interest them without having to explicitly state it in the search box, Nick Caldwell, VP of engineering at Reddit explained.
“I think that people who come to any site, and Reddit in particular, prefer an experience where they don’t have to do manual keyword entry, but want a continuous stream of interesting content,” he said.
Reddit’s search engine had actually been notoriously bad and Caldwell made upgrading it a priority. “One of the things I wanted to do when I started at Reddit, was I wanted to fix [search]. People have been complaining about it for five years,” he said.
Part of the issue up to that point wasn’t a lack of desire to improve the search experience. Everyone understood the issue, but finding the time to update it was another matter. When Caldwell came on board, Reddit had a small team of 40 engineers, whose primary job was keeping a site of this size and scope up and running.
Caldwell said that the company went with the Lucidworks Fusion platform because it had the right combination of technology and the ability to augment his engineering team, while helping search to continually evolve on Reddit. Buying a tool was only part of the solution though. Reddit also needed to hire a group of engineers with what Caldwell called “world class search and relevance engineering expertise.” To that end, he has set up a 30-person engineering search team devoted to maximizing the potential of the new search platform.
Lucidworks is built on the open source search tool, Apache Solr, but company CEO Will Hayes says the commercial product has been built to scale to Reddit-like proportions. “Solr is the core engine. We still heavily contribute to the open source project, but we put a lot of focus on how people consume data,” Hayes explained.
The means working in a streaming fashion to span billions of records in near real time, while using analytics and machine learning to understand the underlying data and deliver more relevant results and content to Reddit users.
Today’s search update is part of Reddit’s wider campaign to update the site’s look and feel, which became an organizational priority after the site’s two founders returned to the company — with Alexis Ohanian coming back in 2014 and Steve Huffman in 2015.
“With Steve and Alexis coming back, they brought to table that the site should be more welcoming and engaging than it has been in the past. It took the leadership of Steve and Alexis to see that the content we have is really a gold mine, and we have to find a way to present it to users to unlock that potential,” he said.
While Lucidworks remains an active partner in the project, Caldwell hopes his team will be able to take over by the end of the year. He says the ultimate goal is a tool that is not only more relevant, but looks better and is more engaging.