Traditionally, a data lake was a place to store amorphous unstructured data, while a data warehouse was where you put very structured data like credit card information. That began changing in recent years as companies saw a need to merge the two concepts and the lakehouse idea — combining the power of the data lake with the computational power of the data warehouse — began to take shape.
Vinoth Chandar, founder and CEO of Onehouse, was working at Uber in 2014. The company was growing fast and had a serious data problem. It had a lot of unstructured data sitting in a data lake, but the challenge was to find a way to execute on that data more quickly, something you could do with a data warehouse, but was more difficult in a data lake due to its unstructured nature.
Part of the problem was just the sheer scale of Uber’s data, which made it difficult if not impossible to shift around using the data lake technology that was available at the time. “If you wanted fast data, you used the stream processing stack [like Kafka and Confluent], right? And data lakes let you scale to huge amounts of data,” he said. But then you were limited by the size of the data.
In 2016, he came up with the idea to combine the two concepts to give him scale and speed. “So this idea was pretty wild, in a sense that it brought a lot of a database tech structure on top of the data lakes,” he said. He created a tool to do this called Hudi, which Uber donated to the Apache Software Foundation as an open source project the following year.
The project began to gain attention and companies needing this kind of processing capability on top of their data lakes began lining up. That includes some pretty big names like Amazon, Walmart, GE Aviation, Robinhood and TikTok. Chandar said that these companies are building massive, exabyte-scale data lakes with Hudi.
Even after Chandar left Uber and worked briefly at Confluent, he remained chair of the Hudi project and began thinking of building a company based on the open source project. He launched Onehouse early last year and raised $8 million from Greylock and Addition to begin building the managed version of the software.
As is often the case with companies built on open source models, some larger companies might have the engineering resources to pull off installing and running the open source project alone, but many companies need help. A managed version removes a lot of the complexity and headaches associated with managing the entire project alone.
The startup has 15 employees with plans to double that number this year. Chandar said he’s looking to build a diverse company, and even the name was selected with that in mind — a company where everyone is welcome.
“We want to make sure everybody in the company has a seat at the table. And one of our engineers actually came up with the name Onehouse, and we actually took that [to show] that we want to create a very diverse and inclusive place here. And we currently have three different self-identified ethnicities within the company, and we think the announcement is going to help us as well reach beyond our networks and help bring more diversity.”
The company plans to use the initial seed capital to keep growing the company, build out the managed product and continue to contribute to the Hudi open source project.