DataTorrent has raised $8 million in a round led by August Capital for its Hadoop-based, real-time data streaming platform. The company had previously raised $750,000 in a seed round with participation from Morado Ventures, Ex-Yahoo Chief Jerry Yang’s AME Cloud Ventures and well-known angel investor Farzad Nazem.
It is the first investment in the data analytics space for August Capital General Partner David Hornik since investing in Splunk, the machine-learning company that had an initial public offering last year.
With Hadoop 2.0 platform as its foundation, DataTorrent has built a platform that can monitor tens of millions of events on a per-second basis. In Hadoop 1.0, customers would get batch files, which provided daily analysis that they received from the Hadoop distributions they had set up to monitor their businesses. But now these customers want the data in real-time. They want to have access to the analysis from the data that is getting stored in log-files as it happens, not a day later.
Customers use the DataTorrent platform to hook into a message Bus, which serves as a data integration platform for processing the millions of data events that they need to monitor. This may be sensor data from factory machines, weather data or any other source of data that needs monitoring. The DataTorrent real-time stream monitors the data and sends out alerts. So, for instance, a media advertising company can be alerted when a click-through-rate dips below 5 percent. Alerts can be created to warn a customer if a machine is going down or if too much memory is getting used or any other relevant issue that may arise which a customer needs to know about.
The data is processed in-memory. DataTorrent crunches it and correlates different dimensions and adapts in reat-time as data volumes expand and detract in the normal course of the day.
The platform is fault-tolerant, meaning it is resilient to disruptions that could force a system down and loss of data.
Co-Founder and CEO Phil Hoang started with Yahoo! in 1996 when there were five or six engineers. When he left, Hoang oversaw 3,000 engineers. Since then he has worked a lot with ex-Yahoo people who had their own businesses. He said he was helping a company in the online ad space. It became clear that they needed real-time results. The customer would buying advertising impressions in an ad exchange. They would then serve ads, earning revenue from those who clicked through. They would then get batch reports a day later. Once day they might make $30,000 and the next would lose $75,000.
He said he realized there was a shift happening. The amount of data companies were using had scaled and they needed to adapt to it in real-time. He started the company about a year ago with Amol Kekre, who serves as CTO. Kehkre was the architect and senior engineering manager for Yahoo! Finance. His work there included building Y! Finance’s real-time data architecture.
In the case of the media company, DataTorrent looks at the costs, the revenues, the volume of clicks and other factors in real-time that comes in unprecedented volumes. A customer can see immediately if the impressions they are buying are not performing.
The company says its streaming platform is designed to fit on top of Hadoop 2.0, making it compatible with Cloudera, MapR and other Hadoop distributions.
It’s machine data that needs processing in real-time. There are a growing number of ways to get to the analysis of data that comes off the factory floor or Internet advertising. But data is so relative and often requires different filters on it to get the desired results. Google and the other data factories are all seeking to help customers find meaning to it. The question will become how ready customers are prepared to process data and analyze it for their own purposes.