Etsy’s 2-year migration to the cloud brought flexibility to the online marketplace

'Now we have, by far, the majority of our engineers working on product features that customers see every day'

Founded in 2005, Etsy was born before cloud infrastructure was even a thing.

As the company expanded, it managed all of its operations in the same way startups did in those days — using private data centers. But a couple of years ago, the online marketplace for crafts and vintage items decided to modernize and began its journey to the cloud.

That decision coincided with the arrival of CTO Mike Fisher in July 2017. He was originally brought in as a consultant to look at the impact of running data centers on Etsy’s ability to innovate. As you might expect, he concluded that it was having an adverse impact and began a process that would lead to him being hired to lead a long-term migration to the cloud.

That process concluded last month. This is the story of how a company born in data centers made the switch to the cloud, and the lessons it offers.

Stuck in a hardware refresh loop

When Fisher walked through the door, Etsy operated out of private data centers. It was not even taking advantage of a virtualization layer to maximize the capacity of each machine. The approach meant IT spent an inordinate amount of time on resource planning.

“We were still bare metal, so we didn’t have a virtualization layer and the team had to spend a lot of time managing capacity and getting the hardware right. You couldn’t just spin up new hardware. To do that took months of cycle time,” Fisher told TechCrunch.

In fact, the company was on 12-month cycles planned around annual budgets. It needed six months to order the hundreds of machines it would require for the following year, along with time to configure, rack and stack them.

Fisher saw this as a perfect scenario for the cloud.

Leaving the data center behind

When he came on board in 2017, he pushed the idea of moving to the cloud, breaking out of the endless hardware refresh loop and putting the company’s engineers to work on more pressing problems.

“One of the first things that I said when I joined full time was that ‘we’ve got to get out of this [cycle] and get to the cloud because we need more of our engineers working up the stack closer to the customers,’ ” he said.

That’s when the company began to look at the public cloud as a serious option. As a result, it has been able to achieve that goal of getting engineers more focused on customer requirements, but it wasn’t always easy getting everyone pointed in that direction.

Becoming a change agent

Going from a data center-driven strategy to the public cloud might look like a reasonable change on paper, but humans who are used to working a certain way must implement the change and learn new ways of working. Neither of these steps are easy in a large organization.

Fisher certainly recognized this. He had been working as a consultant for more than a decade, helping organizations make these very kinds of changes, and knew there would be challenges moving the team ahead.

He also recognized that there are advantages to running your own data centers: You are in control of every aspect of the operation. You don’t have to worry about latency issues or “noisy neighbors” who can degrade your operation’s performance.

“There were definitely technical concerns and concerns about change, but overall we were able to convince people and a lot of that was done through testing,” he said. By moving slowly and proving they could run the marketplace in the cloud, it showed everyone that the plan was within reach.

Getting started

Before they got to that point, the first step was to select a vendor, which meant looking at the usual suspects: Amazon, Google and Microsoft. They began the process of finding a vendor/partner in summer 2017, right after Fisher started at Etsy. They wrote an RFP and broke down what was a massive project into a series of smaller, more manageable chunks.

One major point they kept front and center was the sheer amount of data generated by Etsy’s marketplace — more than a billion data points a day. They had been using machine learning to process and understand all of that data. That meant they were looking for a cloud vendor that could help them on that end. Another important element was this vendor had to be more than just a provider, it needed to be a trusted partner that could work with them.

Fisher says after a couple of months of talking to vendors and listening to presentations, Google really stood out, especially on machine learning. “That’s one of the areas that we think Google really specialized in. Even though they were a little bit later to the game than Amazon and Microsoft, they have a specialty around big data and machine learning that’s just central to their company,” he said.

Once Etsy made the decision to go with Google, the rest was negotiation, and it signed the contract in December 2017.

Moving along

With a vendor in place, the company needed to begin the effort of moving workloads to the cloud, which required tactical thinking. Would they move the workloads as is, lifting and shifting to the cloud with their old ways of operating, or would they rearchitect workloads to take advantage of the cloud?

“We honestly did a mix. With some of the parts of the system, we wanted to get there sooner rather than later, and some of that was just timing for our general seasonality,” said Fisher. “Some of it was because we knew that if we got some early wins, organizationally, that’s a big deal.”

He added, “we lifted and shifted some of the parts of the system, and basically they’re running exactly like they would in the data center, but now in the cloud. Other parts, we really optimized and are taking the time to run it on different parts of systems so that in a cloud, it can auto-scale and use all of the benefits or even use their services.” That means taking it off of their internal system and taking advantage of a Google tool like Big Query.

Best-laid plans

Fisher says that the team put together a spreadsheet of all the tasks involved in making its initial moves to the cloud.

“We had a pre-flight checklist of sorts of every task that needed to be done. We had an owner of who was going to do it. We had a backup person who could do it in their stead if they were tied up with something else. We had the expected time, and all of this was tracked in real time. We had this incredibly orchestrated process,” he said.

But of course, even with all that planning, things did go wrong, so much so they had to stop the process the first time they tried the move over to the cloud and roll back to the data center to regroup in the morning. They set up strict time windows to complete the roll over and if they couldn’t meet those, the default was to roll back and try again.

During the second attempt, Fisher said they reached within two minutes of the window closing when they resolved any final issues and cut over to the cloud version.

Bringing it home

Last month the company announced it had completed its transition to the cloud. While there are many advantages, Fisher said there are still aspects of the control one gets from running a data center that he misses, but added that it’s a feeling you get used to and you rebuild your internal processes to account for that.

The ultimate goal of migrating to the cloud was getting Etsy’s engineering team focused less on the nuts and bolts of running the back end so they could put more time and energy into helping customers. “Now we have, by far, the majority of our engineers working on product features that customers see every day, and less and less having to manage this very low-level infrastructure stuff.”

That’s what moving to the cloud will do for you.