DataTorrent, A Big Data Platform Built On Hadoop, Can Process A Billion Events Per Second

DataTorrent made its primary product, DataTorrent RTS, generally available today. The product is built on top of Hadoop 2.0 and allows companies to process massive amounts of big data in real time.

According to Phu Hoang, co-founder of the company, who was one of the first engineers at Yahoo! hired in 1996, the product can process more than a billion data events per second. Prior to a product like DataTorrent, companies would be looking back at what happened, rather than processing it in real time as it happens, and that’s a big change.

As Alex Williams reported in an article last year on TechCrunch, “The data is processed in-memory. DataTorrent crunches it and correlates different dimensions and adapts in real-time as data volumes expand and detract in the normal course of the day.”

While still at Yahoo! Hoang helped oversee the development of Hadoop 1.0 and 2.0 and he saw the potential of Hadoop as a big data processing platform.

After Hoang left Yahoo! in 2007, he talked to lots of companies who had big data issues and one of the biggest was latency, that is they have been looking back at events after they happened. After reviewing data, they could see what happened, but it was too late to take action. DataTorrent aims to give those customers the ability to see data as it’s generated, and react in real time to these events.

And that’s the promise of big data, to change the way you do business by letting you react to this massive amount of information. If the events are over, you can only use the information to improve the next time, but if you’re seeing it as happens, you can react immediately and potentially at least, have a real impact on how you do business.

Hoang and fellow co-founder Amol Kekre both came out of the Yahoo! Hadoop development team and they saw a business opportunity in this problem. It’s worth noting that most of the company’s engineering team also came from Yahoo! –and Yahoo! founder Jerry Yang is one of the investors.

Hoang says they are involved with large companies who are about to experience an explosion of data, especially as more data comes from sensors or machines, also known as the Internet of Things. He says currently data is “big,” but it’s mostly human generated. Over the next 3-5 years that will pale in comparison to the data coming in from these machine-generated sources.

Companies can build applications on top of the DataTorrent platform to process whatever type of data is important to them. That could be finance, weather data, advertising bidding or any other business that could benefit from real-time processing.

Hoang says they connect to the most popular databases and provide ways to connect to dashboards with real time insights. What’s more, customers can build applications with automatic triggers to make things happen when certain conditions are met such as buying or selling, or to alert humans when it’s more complex and requires more sophisticated decision making.

David Hornik of August Capital, whose company provided DataTorrent with $8M in Series A financing last year, says that while Hadoop is clearly popular, this product allows companies to really take advantage of it in a way that hasn’t been possible up until now.

“Everybody acknowledges big data matters, and Hadoop is perfect container, but that doesn’t help run your business. How do you take advantage of these nodes distributed around the world?” he asks.

Hornik was also an early stage investor in Splunk, and he sees lots of potential here for DataTorrent. “When you can process a billion data points in a second, there are a lot of possibilities.”

Hadoop is an open platform that has the potential to create many new businesses, and the Yahoo!-Hadoop team has generated some of its own including Pepperdata, a Hadoop monitoring and scheduling tool, which is run by a couple of former Yahoo! engineers, and Altiscale, a cloud-based version of Hadoop run by Yahoo!’s former CTO, Raymie Stata.