Hardware generally doesn’t interest me too much, so when I heard about the Open Compute project I didn’t give it too much attention. Casually reading up on the subject a little more left me even less interested. Why should Facebook have to design their own hardware, I wondered? Wouldn’t hardware vendors be clambering over each other to supply Facebook with gobs and gobs of servers for their data centers?
Amir Michael, Facebook’s hardware lead, discussed the Open Compute project in a keynote presentation at LinuxCon. He laid out the root problem: hardware manufacturers, in an effort to provide differentiation, were actually creating more problems than they were solving. The on-system instrumentation that OEMs provided for Facebook created additional complexity, and ultimately wasted space and produced unnecessary heating concerns.
The HPs and Dells and IBMs of the world had established a very successful business for themselves selling servers with their own customizations, and in smaller quantities those customizations did provide some modicum of benefit to their customers. When you’re buying several hundred servers from a single manufacturer, that manufacturer’s management tools are easy to consume.
But when you’re buying several thousand servers at a time from multiple vendors, the different management tools simply get in the way. The differences between chassis designs and motherboard layouts complicate service issues for the data center staff.
Facebook made the remarkable decision to solve this problem for themselves. They designed their own power supply, which reached 95% efficiency. They designed a vanity free server case, which provided easier access for technicians. This resulted in an unexpected benefit for heat dissipation. They went on to design a motherboard with no cruft: just the absolute essentials for the computing requirements. This mainboard was cheaper to produce, and also shared improved thermal properties. And finally, Facebook redesigned the venerable server rack to make it substantially easier to access, move, and deploy.
An important, but oft-overlooked ancillary benefit to Facebook’s vanity-free and minimalist designs is that they involve less waste, both in the production process but also in the disposal process. When you’re buying thousands of servers, this becomes a very important ecological issue. Computer waste is a serious environmental concern, and too many consumers of technology ignore the consequences of disposal.
Recognizing that their data center headaches couldn’t possibly be unique, Facebook shared all of their design specifications, CAD drawings, and reference materials under open licenses to their newly formed Open Compute Project.
The reason for this decision, as Michael said in his presentation, is that “openness always wins.” He pointed to the advent of the USB standard as the perfect illustration of this point. Prior to USB, the PC industry was plagued with finnicky peripherals and an abundance of sub-standard interface options. USB, developed openly and in collaboration with multiple interested parties, reshaped the peripheral market into what we enjoy today.
My first question to Michael was “Why didn’t the market solve this problem?” Specifically, why didn’t any of Facebook’s hardware vendors recognize the problem and address it. He pointed out that the bulk of the work began in 2009, when Facebook was considerably smaller than it is today. None of Facebook’s vendors really saw the scale to which Facebook could grow, and as such didn’t see a need to change their products in any meaningful way. The notion of “scale out deployments” hadn’t quite hit the mainstream.
Michael shared with me that all of their internally developed specifications are shared with multiple vendors, and manufacturing proposals are reviewed internally through a democratic process. Each proposal is analyzed according to a number of factors.
When a hardware design is approved for manufacturing, Facebook always uses two vendors for production. The end result is two identical products from two discrete vendors; but this results in supply-chain diversity and improved product continuity: both of which are important factors when dealing with production runs at the scale Facebook demands.
Michael pointed out that all of the benefits of scale out development — power, cooling, ease of access — benefit small and medium business consumers just as much as enormous data centers. He also shared that the response to the Open Compute project was unexpected. The reference designs were adopted by participants in a number of different markets and tweaked to provide the kinds of benefits needed in those markets.
Historically, large scale providers have been cagey talking about the details of their infrastructure. As a result of the Open Compute project, more and more organizations are growing comfortable talking about the specifics of their data centers. This is slowly resulting in better design and implementation decisions, which will in the long run be better for the environment.
Say what you will about Facebook’s business and marketing decisions, but you can’t argue that they’re doing the world a favor by reducing waste in computer manufacturing designs. The issues involved will only get more important as more and more technology is manufactured. The Open Compute project is a great start. We need more involvement in things like this. We also need to make sure that we’re adequately dealing — as an industry — with the proper disposal of end-of-life hardware.