Facebook Publishes Super Nerdy Big Data Engineering Blog Post To Attract Hardcore Coders

100 petabyte clusters! 60,000 hive queries a day! Facebook’s latest 1,800-word engineering blog post has one goal: proving to the world’s top programmers that if they want a challenge, they should work for the social network. There’s not much for the layman beyond that Facebook’s data warehouse is 2,500 times bigger than in 2008. This is back-end geek porn, and it’s critical to Facebook’s longterm success.

Facebook has the same talent retention problem as any tech startup that goes public. Without the massive upside of a little stock potentially being worth a lot of money one day, getting the best coders, designers, product visionaries, and biz whizzes to come aboard or not jump ship is tough.

There’s the lure of founding a company and calling the shots. There’s the excitement of joining an ass-kicking little startup as it hits its hockey stick. If Facebook can’t outshine those, it could stagnate in its maturity and become more vulnerable to disruption.

But Facebook has one thing young startups don’t have. Or should I say one billion things. Its massive user base means that what it builds seriously influences the world, and it’s trying to solve engineering problems on the forefront of computer science. At first glance, though, it might just seem like another consumer product. That’s why it needs blog posts like “Under the Hood: Scheduling MapReduce jobs more efficiently with Corona”.

The note details the limits of the Hadoop MapReduce scheduling framework, and how Facebook built its own version of Corona to surpass those limits. Facebook has open-sourced Corona and it’s now on GitHub. The benefits include dropping slot refill times from 10 seconds with MapReduce to just 600 milliseconds, cutting job latency in half, and better cluster utilization and scheduling fairness. I’m not going to paraphrase them any more, so if that stuff fascinates you, read the post.

Facebook has been publishing engineering blog posts for years, but the Under The Hood series started right about when it filed to IPO. Old eng blog posts used to be more about the human story of building Facebook’s back-end, but seem to have gotten more hardcore since it went public. And that’s smart, because it doesn’t have the financial windfall of a rapidly rising valuation to attract engineers anymore.

Facebook must show it is a riddle, wrapped in a mystery, inside an enigma, because that’s what gets great programmers fired up.