Citus Data has launched CitusDB for Hadoop, a service that can process petabytes of data within seconds. The offering shows once again that the new class of analytics databases that can analyze everything from data to entire libraries of digital books are the next big thing.
CitusDB is based on Google Dremel, a real-time analytics database that has surpassed Hadoop’s analysis capabilities. The difference is in its parallel-computing capabilities and SQL-like functionality. Do a query across petabytes of data over thousands of servers and the results come back in real-time.
That’s far faster than Hadoop, which uses what is known as “batch processing,” for analyzing data. The data gets analyzed with an inherent time delay, ranging from minutes, hours or days.
Citus is similar to analytics databases, such as Cloudera Impala and Apache Drill, both also based on Dremel. Executives with Citus say the differentiator is its use of PostgresSQL, the widely used, open-source database that has the ability to query a wide set of data types and is familiar to the enterprise.
CitusDB’s innovation comes in its high-performance parallel processing capability within the Postgres core, said Matt Ocko, a partner with Data Collective and an investor in the company. The result: an analyst can make one query with sub-second results across all of an enterprise’s data. The data can be on any physical server within Hadoop or a dynamic environment like MongoDB. It can run on-premise or on a cloud platform such as Amazon Web Services. Parallel-processing capabilities offer customers full-text search, geo search and enterprise-level security, a benefit of Postgres position in the enterprise market.
Citus is one of these companies that actually has an innovative technology that uses software across a horizontal infrastructure to do analysis without having to spend a fortune in proprietary hardware and software. This is what the big enterprise players are up against in this new round of innovation. Software that can run on cheap hardware to pull out data and do analysis. It’s far cheaper than what Informatica or HP Vertica offers.
Citus represents the innovation happening in the analytics database market. It’s still such an early time and the competition comes not just from Cloudera but also Google, which now offers API access to BigQuery, which is based on Google Dremel. There are also startup competitors such as Drawn To Scale, which offers Spine, a real-time database for large applications built on Hadoop.