Twitter's Open Source Big Data Tool Comes to the Cloud Courtesy of Nodeable

Usually when we think of a pivot, we think of a company that has decided to drop its core offering and market a different product or service. Obvious Corporation put ODEO up for sale and focused on Twitter. BRBN shuttered its location check-in service and became Instagram. But Nodeable‘s pivot isn’t that sort of pivot.

Today Nodeable launched a new service called StreamReduce, a cloud-hosted real-time big data analytics product. StreamReduce is based on the same architecture as Nodeable’s existing IT operations monitoring tool. The company is keeping its current service, but is expanding its scope by marketing beyond its current base of developers and system administrators.

At the heart of StreamReduce is Storm, a real-time analytics engine that was originally developed at BackType, a company that was acquired by Twitter last year. After the acquisition, Twitter allowed lead developer Nathan Marz to finish the project and open source it. Twitter is now using Storm internally.

StreamReduce is essentially Storm hosted in the cloud, with a few extras such as connectors to Apache Hadoop. Nodeable CEO Dave Rosenberg explains that Storm is meant to complement, not replace, Hadoop. Hadoop is great for running analytics on huge data sets that you’ve already collected, but it’s not good for processing streams of incoming data. That’s where Storm and StreamReduce come in.

Storm isn’t the only project trying to solve the big data streaming problem. Apache S4 is an open source project originally developed by Yahoo that provides similar functionality, and HStreaming offers a proprietary product that adds real-time capabilities to Hadoop. But Storm is the project that seems to be gaining the most traction. For example, the contact management startup FullContact chose Storm over other options.

“We wanted to try open source first, as it keeps our options wide open should we want to change technologies. That ruled out HStreaming,” explains FullContact CTO Dan Lynn. “S4 was very interesting, but I didn’t get the impression that it had captured the enthusiasm of the developer community as well as Storm.” Nodeable chose to use Storm as its base for similar reasons.

Nodeable launched last year as a challenger to Splunk, the big data company that IPOed earlier this year. Spunk sells an on-premise tool for collecting and analyzing large data sets. It’s become best known for handling machine generated data, mostly system log files from servers, but it could be used for pretty much any data set.

Nodeable offers a similar solution, but tuned specifically for IT operations data and hosted in the cloud, along with real-time capabilities and a Twitter-like interface. Rosenberg says the service had plenty of users – so why the pivot? “As the company grew we realized we needed to either go further into management, or further into analytics,” he explains. What users wanted was better analytics – there are plenty of IT management solutions available. As the Nodeable team delved deeper into analytics, they found the core problem with most analytics tools is a disconnect between the needs of business users, who want dashboards and alerts, and the needs of data analysts who want to run custom queries.

Part of this problem stems from a divide between the need to run batch queries on historical data, which Hadoop is good at, and the need to process incoming streams of data for actionable intelligence, which is what Storm is built for. What Nodeable needed was a way to get data in and out of Hadoop, and to process certain data before it even hit Hadoop. With those problems solved, the Nodeable team realized that they’d stumbled onto something more broadly applicable than its log-file centric service. “We ended up solving a problem that we hadn’t really set out to solve,” says Nodeable CEO Dave Rosenberg. He also says Nodeable will open source much of its code, including reference implementations for Storm analytics and an agent for collecting analyics from Amazon EC2 instances.

Nodebale will be going up against companies like HStreaming has a partnership with Microsoft to bring its real-time solution to Hadoop on Azure. Also, last year Amazon Web Services posted a job listing for someone “to lead the team that is building a disruptive new service for processing Big Data streams,” suggesting that we may see a real-time complement to the cloud provider’s Elastic MapReduce service. Expect strong competition in this space in coming months.