Data is the lifeblood of the modern corporation, yet acquiring, storing, processing, and analyzing it remains a remarkably challenging and expensive project. Every time data infrastructure finally catches up with the streams of information pouring in, another source and more demanding decision-making makes the existing technology obsolete.
Few cities rely on data the same way as New York City, nor has any other city so shaped the technology that underpins our data infrastructure. Back in the 1960s, banks and accounting firms helped to drive much of the original computation industry with their massive finance applications. Today, that industry has been supplanted by finance and advertising, both of which need to make microsecond decisions based on petabyte datasets and complex statistical models.
Unsurprisingly, the city’s hunger for data has led to waves of database companies finding their home in the city.
As web applications became increasingly popular in the mid-aughts, SQL databases came under increasing strain to scale, while also proving to be inflexible in terms of their data schemas for the fast-moving startups they served. That problem spawned Manhattan-based MongoDB, whose flexible “NoSQL” schemas and horizontal scaling capabilities made it the default choice for a generation of startups. The company would go on to raise $311 million according to Crunchbase, and debuted late last year on NASDAQ, trading today with a market cap of $2 billion.
At the same time that the NoSQL movement was hitting its stride, academic researchers and entrepreneurs were exploring how to evolve SQL to scale like its NoSQL competitors, while retaining the kinds of features (joining tables, transactions) that make SQL so convenient for developers.
One leading company in this next generation of database tech is New York-based Cockroach Labs, which was founded in 2015 by a trio of former Square, Viewfinder, and Google engineers. The company has gone on to raise more than $50 million according to Crunchbase from a luminary list of investors including Peter Fenton at Benchmark, Mike Volpi at Index, and Satish Dharmaraj at Redpoint, along with GV and Sequoia.
While web applications have their own peculiar data needs, the rise of the internet of things (IoT) created a whole new set of data challenges. How can streams of data from potentially millions of devices be stored in an easily analyzable manner? How could companies build real-time systems to respond to that data?
Mike Freedman and Ajay Kulkarni saw that problem increasingly manifesting itself in 2015. The two had been roommates at MIT in the late 90s, and then went on separate paths into academia and industry respectively. Freedman went to Stanford for a PhD in computer science, and nearly joined the spinout of Nicira, which sold to VMware in 2012 for $1.26 billion. Kulkarni joked that “Mike made the financially wise decision of not joining them,” and Freedman eventually went to Princeton as an assistant professor, and was awarded tenure in 2013. Kulkarni founded and worked at a variety of startups including GroupMe, as well as receiving an MBA from MIT.
The two had startup dreams, and tried building an IoT platform. As they started building it though, they realized they would need a real-time database to process the data streams coming in from devices. “There are a lot of time series databases, [so] let’s grab one off the shelf, and then we evaluated a few,” Kulkarni explained. They realized what they needed was a hybrid of SQL and NoSQL, and nothing they could find offered the feature set they required to power their platform. That challenge became the problem to be solved, and Timescale was born.
In many ways, Timescale is how you build a database in 2018. Rather than starting de novo, the team decided to build on top of Postgres, a popular open-source SQL database. “By building on top of Postgres, we became the more reliable option,” Kulkarni said of their thinking. In addition, the company opted to make the database fully open source. “In this day and age, in order to get wide adoption, you have to be an open source database company,” he said.
Since the project’s first public git commit on October 18, 2016, the company’s database has received nearly 4,500 stars on Github, and it has raised $16.1 million from Benchmark and NEA.
Far more important though are their customers, who are definitely not the typical tech startup roster and include companies from oil and gas, mining, and telecommunications. “You don’t think of them as early adopters, but they have a need, and because we built it on top of Postgres, it integrates into an ecosystem that they know,” Freedman explained. Kulkarni continued, “And the problem they have is that they have all of this time series data, and it isn’t sitting in the corner, it is integrated with their core service.”
New York has been a strong home for the two founders. Freedman continues to be a professor at Princeton, where he has built a pipeline of potential grads for the company. More widely, Kulkarni said, “Some of the most experienced people in databases are in the financial industry, and that’s here.” That’s evident in one of their investors, hedge fund Two Sigma. “Two Sigma had been the only venture firm that we talked to that already had built out their own time series database,” Kulkarni noted.
The two also benefit from paying customers. “I think the Bay Area is great for open source adoption, but a lot of Bay Area companies, they develop their own database tech, or they use an open source project and never pay for it,” Kulkarni said. Being in New York has meant closer collaboration with customers, and ultimately more revenues.
Open source plus revenues. It’s the database way, and the next wave of innovation in the NYC enterprise infrastructure ecosystem.