By: Daniel Gwak and Ivy Nguyen, Point72 Ventures
Cloud computing seems ubiquitous, but full enterprise adoption is still bottlenecked by a real physics problem: the speed of light just isn’t fast enough. This limitation causes all sorts of problems with reliability and accuracy that prevents large enterprises from using the distributed databases of cloud computing for all of their activities. Since everything we interact with in the digital universe is supported by a database, building a system that can outrun the speed of light is a groundbreaking technical achievement that can unlock $100B in enterprise spending on databases and associated services. We believe that the team at Fauna has accomplished exactly this with FaunaDB, an OLTP database built for the cloud. That’s why our team at Point72 Ventures led Fauna’s $25M Series A round—the largest ever series A for a database startup.
An expensive and expansive problem
In order to support mission-critical business activities, these databases must be 100% correct, 100% of the time. For example, if your bank builds a mobile app, it had better not be wrong when it shows you a bank balance. When your bank balance trends low, it’s almost never a database error – your penchant for artisanal avocado toast is the more likely culprit. But showing you that account balance in your mobile banking app demands ‘correctness’ of a database in many ways: it should never lose a record, it should never read and write records out of order, it must respond quickly to lookups, it must never go down, and so on. When things do go wrong – as they recently did at Wells Fargo – customers lose access to their bank accounts, paychecks disappear, and instant outrage ensues.
Building databases is a tough job because few people have the skills and experience to build systems that can process tens of thousands of transactions a second with the kind of ironclad reliability that’s necessary. You are not doing it in house.
That’s why, today, over 70% of the $50B database market is dominated by three companies alone: Oracle, IBM, and Microsoft. Enterprises continue to use these databases (often running on expensive specialized hardware on mainframe systems) because they offer data correctness guarantees that were stronger than most systems.
Why this breaks for cloud computing
Today, enterprises want to move more of their systems to the cloud in order to enjoy the cost and flexibility advantages of cloud computing. However, applications that live in the cloud are distributed systems — they don’t just live in one place on one machine anymore. Additionally, these applications are designed to respond to load patterns that are dynamic. Cloud computing enables you to scale out or scale in your application based on the current traffic to your application (e.g. to handle Black Friday spikes). Finally, to offer a snappy response time and a high quality user experience, data must live close to these applications.
Consequently, these cloud-based applications need databases that are designed to work distributed as well. And delivering mainframe-class data correctness and reliability in a distributed environment is a really hard technical problem, often considered a holy grail in the realm of computer science.
In terms of database design principles, the problem is one of tradeoffs. With distribution of data across many sites, you can either pick speed of access or correctness, but not both. For example, if you have a database with replicas in New York and Tokyo, each supporting tens of thousands of transactions per second, establishing the strict order in which transactions occurred is a real challenge when you consider that even the speed of light would take 72 milliseconds to make the round trip – in fact, network traffic (which suffers various slings and arrows along the way) takes over 200 milliseconds between the two locations. To get around this constraint, you can choose to lock the database for the duration of the lag to ensure that there is no confusion (prioritizing correctness), or you can let each database transact at full speed but agree to disagree on the order in which transactions happened (prioritizing speed).
Choosing a distributed database that offers speed or correctness depends on the type of your application workload. Credit card processors choose correctness over speed, because users won’t notice the additional fractions of a second but would definitely notice if their transactions were recorded inaccurately. In other applications, prioritizing speed and giving up correctness is okay. For example, Facebook databases must handle over 1.6 billion reads per second during peak hours and their databases are widely geographically distributed – so they give up on strict correctness. While you may not realize it, what you see might be different from what another user sees during these times. Unfortunately, many applications – such as stock exchanges where you need to know with extreme certainty which trader sold a stock and at what time and price – cannot make that tradeoff.
Breaking the enterprise cloud database bottleneck
We invested in Fauna because their technology minimizes this tradeoff between speed and correctness. It does so by commercializing a paper known as Calvin, written by renowned database researcher and academic Dr. Daniel Abadi and his research group.
In Calvin, Abadi described a consensus algorithm where multiple machines would pre-process transactions to agree on their order before executing them to completion. Thus, strict serializability of transactions would be achieved without the need for perfect knowledge of the time at which they occurred. Another place you may have heard of consensus protocols is in blockchains – with which Fauna shares some fundamental commonalities (without all the sketchiness of influencer-hyped coin offerings).
Without a system like FaunaDB, companies overcame this challenge through exotic (and expensive) engineering or by making business sacrifices. For example, many games require users to select a geographic ‘zone’ in which to compete – this is usually driven by a database tradeoff between speed (reducing lag in gameplay) vs. correctness (determining who pulled the trigger first). Making this sort of restriction can limit the enterprise’s geographical reach and stymie growth.
Other solutions in Fauna’s class can be prohibitively expensive or only partially solve the tradeoff. Google’s Spanner, uses atomic clocks, fast networks and GPS to determine exactly when (and consequently, in what order) transactions occur anywhere in the world. Only atomic clocks work because traditional clocks tend to drift apart over time because of the tiny physical variances in the quartz crystals that the clocks use to keep time. This approach worked well for Google because they run their own cloud platform. Spanner cannot be deployed in 3rd party clouds, or multi-cloud environments, and is therefore limited to a Google-only market. Some companies such as Cockroach Labs, based their database on Spanner’s protocol, but suffer from the lack of access to atomic clocks and fast networks, and therefore cannot deliver the same guarantees that either Spanner or FaunaDB do.
The Fauna team, led by Evan Weaver and Matt Freels, understood what enterprises actually need in a cloud-optimized database thanks to their experience building Twitter’s infrastructure as it grew into one of the largest web-scale distributed systems in the world. In order to support more enterprise applications in the cloud, the next generation of databases had to drastically minimize these tradeoffs between speed and correctness. More importantly, it has to be done in a way that’s accessible to all enterprises, not just the small handful of companies large and technical enough to employ the world’s best database engineers.
Enterprises spend $50 billion every year on antiquated database technology alone, and even more than that on the application logic necessary to compensate for functionality that should be table stakes in a cloud-native database. Building a database that solves these problems is a tremendous market opportunity because databases are one part of the stack that customers can’t build on their own (it’s too damn hard). That effectively funnels all the demand to the few players that can credibly build the product – and rarely have so few players credibly competed for such a large market. We’re excited to see how this market evolves. The cloud database wars have only just begun.