Pepperdata Scores $15M For Hadoop Cluster Management

Pepperdata, a company that’s built a platform for managing and fine-tuning Hadoop cluster performance, announced a $15M Series B funding round today.

The round was led by new investors Wing Venture Partners along with Citi Ventures and Silicon Valley Data Capital. Prior investors Signia Ventures and Yahoo Chairman Maynard Webb also participated. Today’s money brings the company’s total investment to $20M.

Hadoop is an open source platform for managing huge amounts of data (big data). Pepperdata co-founder Sean Suchter ran the web search engineering team at Yahoo, which developed the first production use of Hadoop (according to information supplied by the company).

Pepperdata is trying to optimize Hadoop cluster performance. Hadoop is basically a dumb scheduler, Chad Carson, the other co-founder explained. It knows a job is ready to go, but it doesn’t have any sense of how to prioritize it based on needs or status within a company.

Pepperdata provides a way to fine-tune that scheduler, prioritizing Hadoop jobs to give resources to the ones that need them most, and ensuring that a company adheres to its service level agreement (SLA).

Often a job from a particular department will have higher priority, yet will get pushed out of the way by lower-priority work loads, Suchter explained.

“We’ll look at the jobs and see that a low priority guy is flooding the network or taking [hard] disk, and reach into the interfering job and slow it down just enough to give [the higher priority one access],” Suchter said.

He points out that it’s a subtle nudge, rather than brute force, cutting in line. The system knows how to dial down access for the lower priority job just enough to let the other one finish up.

The system does more than prioritize, however, Carson says. It also uses available capacity across the cluster more efficiently. Sometimes a job may take 4GB of memory when it only needs 2.5GB. Pepperdata can analyze usage across the system and free up capacity in cases where there are unused resources available — and thereby make the system run much more efficiently.

In fact, Pepperdata claims it only takes 15-30 minutes to install the product and upon installation, customers could see a 30-50 percent throughput gain as the system frees up resources and makes them available.

Pepperdata typically goes to a customer, installs the software and then monitors it for a week or so. It can then illustrate those capacity gains and fine-tune the system based on individual customer priorities.

Up until recently, Pepperdata customers were mostly early Hadoop adopters, already in production, who could see gains the fastest, but more recently the company has been focusing on specific verticals such as banking, healthcare and telecommunications.

Pepperdata, which was founded in 2012, has headquarters in Sunnyvale and recently opened a new northeast office in New York City. It has around 30 employees today, but plans to add more to product and sales and marketing with the new money over the coming year and be close to 50 by year’s end.

The company measures sales in nodes and currently claims around 10,000 licensed nodes, according to Suchter.