How Did Dropbox Scale To 175M Users? A Former Engineer Details The Early Days

Ladies and Gentlemen, we interrupt our normal programming about crazy entrepreneurs and even crazier VCs to bring you a little learning from the world of Engineering. Remember that? Recently Dropbox was in the news for revealing they’d hit 175 million users, and daring to say they could replace the hard drive. Big words. But what’s the engineering back-story of how they got there? How do successful startups scale, in technical terms, to hundreds of millions of users? It turns it one of the ways it became successful was by creating a very simple and flexible platform early on.

Today in Budapest, Rajiv Eranki, who was previously head of server engineering at Dropbox, gave a room full of engineers at the RAMP conference a rundown of how they did it. He joined Dropbox in 2008 and left in 2011, as he wanted to tackle, he says, “new, different challenges” – one of which is potentially opening a cocktail bar in New York. (RAMP is run by some big companies that have emerged from Hungary: Prezi, Ustream and LogMeIn).

After being lured away from a potential career in academia, Eranki joined when Dropbox had only 2,000 users. He worked at scaling up the platform with just one other person working full time on the back end. In those days, Dropbox had just one database machine and one front-end server.

Eranki told the audience how that early team did “a lot of things that weren’t efficient but did actually scale for thousands of users.”

So for instance, the first iteration of sharding on Dropbox was quite buggy. “Joins” across databases had to be separated, and there was a lot of ‘denormalisation’.

That said, they “would not have changed a thing” and this kind of scrappy, slightly haphazard way Dropbox started up actually created some benefits in engineering terms.

They could run queries on the user behaviour very easily without having to write any special code. They could do Joins across databases as they needed. And the structure allowed for a lot of bug fixing as they could do queries in MySQL easily. Users with large numbers of shared folders only had to make one query of the database. Another benefit was that having just one front end meant the team only had one log to look at.

All this meant they “gained tremendous flexibility and scalability,” said Eranki.

For starters, not separating the database out from the start meant they could do things that would normally require lots of work otherwise. Another great learning from the early days was this: They used Python for everything. And it worked.

It meant that after one million users the whole platform was still only running on hundreds of lines of code, instead of thousands. By using Python for it all “we could get to 40m users without having to write thousands of lines of C code.” Even the client app was written in Python.

Out of this came some learning such as, in terms of app specific metrics such as ease and ‘fooplroofness’. Plus, it emerged that “most graphs are useless”. Instead they built dashboards to analyze the performance; they always put lefts on values (like failed log-ins etc); and they kept some slack such as: extra queries were memcached, and delayed optimization of SQL queries.

Eventually, it turned out that “users” who used Dropbox the most – like almost constantly – were either using is illegitimately (like trying to use it as a CDN etc) or it was just bugs. It was the second biggest users of Dropbox – the core legitimate users – and the categories of behavior that they were exhibiting that ended up suggesting how Dropbox could evolve as a real business. It’s moments like that when a mere product can turn into a multi-million-user business.

Eranki also came up with some great startup lessons.

He said that every time they tried to anticipate things or “be clever in advance” they failed. In fact, it was much easier to just stay on top of the architecture as it grew and keep tabs on it.

To avoid ‘Murphy’s Law’ of things going wrong, they would do things like take web servers and hard reboot them just to see if they would restart themselves.

The team also found it was easier to keep log data rather than delete old code – usually there would be a need for it later on for whatever reason. “Delete nothing unless necessary,” said Eranki. A major conclusion of those early days: Be sceptical about adopting new technology.

Eranki shared some things they also did wrong.

They did not keep a good track of downtime or degraded performance. And with hiring, they found they should have started sooner and things worked best when the hired people who were connected to the company in some way or knew the company. From this they learned to hire more people who were in turn capable of attracting more potential hires.

In the end, Eranki said that his early Dropbox team found that “being clever about architecture in advance is hard” and “scaling for us was more about prioritizing projects… and building process.”

Asked if Dropbox could scale to a billion users from its current 175 million, he said yes it could. After all, that’s only five times bigger than what it is today.