The Efficient Cloud: All Of Salesforce Runs On Only 1,000 Servers

Earlier today, I sat in on a keynote presentation at’s analyst event in New York City. CEO Marc Benioff and other Salesforce execs went over the earlier news that companies can now track Twitter conversations inside Salesforce. Naturally, I Twittered my notes (reproduced below). Salesforce is basically implementing Track (the ability to search and monitor conversations by keyword and topic) inside in a way that hopefully Twitter will make possible for all of its users.

But the data point I found most interesting had nothing to do with Twitter. Salesforce talked about its own back-end infrastructure and revealed that all of runs on only about 1,000 servers. And that is mirrored, so it is really only 500. Think about that for a minute. Salesforce has more than 55,000 enterprise customers, 1.5 million individual subscribers, 30 million lines of third-party code, and hundreds of terabytes of data all running on 1,000 machines. Amazon’s Web Services, in comparison, runs on about 100,000 machines I am told by someone with knowledge of Amazon’s server infrastructure.

The comparison is not entirely fair because some of Amazon’s Web services, such as its EC2 compute cloud, are not shared among customers. (In other words, when a developer signs up for it, he gets a dedicated machine or portion of a machine running his compute “instance”). But still, that is roughly a 100 to 1 efficiency advantage that Salesforce has over Amazon’s cloud. It gets this by running a proprietary codebase, proprietary database, and proprietary “multi-tenant optimizer” that slices and dices the data in a very efficient way.

All of Salesforce relies on data stored in only ten databases that run on about 50 servers. It holds several patents on ways to index the tens of millions of rows of raw data. But it’s secret weapon is that “optimzer” which queries the databases and makes sense of all the data. This is all highly proprietary stuff. Benioff pooh-poohed open-source efforts that are less efficient: “We have real-time query optimization. We don’t use some out of the box open source query optimization. Those things don’t work.” Ouch.

Below are my Twitter notes from the event in chronological order (bold added for emphasis):

# Listening to Marc Benioff at press/analyst event. Says Salesforce has 30M lines of code written by others via APPExchange.

# Marc Benioff claims writing an app on the cloud is 5X faster and 5X cheaper than creating conventional enterprise apps. Good talking point.

# Salesforce stole Genius features from Apple, shows related deals. Benioff loves to borrow from consumer apps and make them enterprise.

# Wow, customer service support is now an $8B business, up from $5B in 2004. But customer satisfaction is flat. All going to the cloud.

# At Salesforce event, they are finally talking about Twitter. They claim 8M Twitter users. Wonder where they get that from.

# Salesforce CRM for Twitter is basically Track. Lets you monitor, search, track topics inside salesforce.

# Salesforce Twitter Track searches both original Tweets and replies, even if the keyword is not in the reply. Twitter, please take note.

# Salesforce Twitter feature imports the entire conversation when it gets a search hit.

# Salesforce Twitter integration is two-way. Cos. can monitor Twitter conversations, and then reply back, and the reply appears on Twitter.

# Benioff on need for new development environments for cloud apps: “These are not things you buy out of the box from Frys!”

# Salesforce hosts 13M customizations to its apps on its database.

# Salesforce has over 30M lines of 3rd party code. How does that compare to how many lines of code are in an Oracle or SAP app? Anyone know?

# Starbucks CTO saying they have gotten 65K ideas from customers in a year, MyStarBucksidea. Only 25 ideas implemented (like anti-spill stick)

# Starbucks CTO Bruzzo: tried to raise 1M hours of volunteer service in January both from stores and online. Built app in 21 1.3M hrs

# Benioff: “We have real-time query optimization. We don’t use some out of the box open source query optimization. Those things don’t work.”

# Salesforce enterprise customers can open up tunnels and share data with each other. Cool. It’s EDI for the masses.

# Salesforce manages hundreds of terabytes. Salesforce runs less than 1,000 machines. Salesforce running on about 10 databases worldwide.

# Each Salesforce database supported by about 50 servers, 2 mirrors, one codebase.

# All of the data of all the billions of rows in salesforce fits into only about 20 tables. Patents on indexing pivot tables.

# Salesforce takes the raw data and compresses. Creates a “multi-tenant index”: “shared massive structures with tens of billions of rows.”

# Salesforce’s secret sauce: It queries its databases with “The Multi-Tenant Optimizer.”

# Salesforce database going from tens of millions of rows to billions of rows per tenant/customer.

(Photo by JohnSeb).