Basho open-sources its Riak TS database for the Internet Of Things

It seems as though every device manufacturer in the world wants to connect its products to the internet, from mattresses and washing machines to toasters and juicers. There’s so much data out there that is just sitting around waiting to be analyzed.

The amount of this kind of data that exists will only go up, and the capacity to transmit it is slowly becoming available from companies like SigFox (which has raised a whopping $150 million dollars for networks like this). However, most distributed database architectures aren’t capable of serving up and writing data at rates to match the bandwidth that the rest of the market is making strides towards.

Seattle-based Basho is trying to solve some of these problems with the latest release of their noSQL Riak TS database. The TS stands for time-series, data points whose unique key value is the timestamp at which they were produced. While the TS system has been available to Basho’s enterprise clients (which include the likes of Uber and AT&T) for a short time now, the open-source release marks the first time developers have the access to a platform specifically for data of this type.

Among peers like MongoDB and DataStax, Basho is the least well capitalized with only $25 million in financing coming in over the past year. It’s clear that the company is pinning its hopes on time-series data will give an edge over competitors in the noSQL space.

The new release also provides integration with the Apache Spark cluster framework, with automatic distribution and interaction of data for in-memory processing in Spark and storage in Riak TS.

While this may seem trivial to most, as anyone who’s played around with large amounts of time-series data points from sensors, distributed data at scale can often lead to long read/write times even during compute runs, with the redundancy of distribution being efficiency’s undoing.

Most solutions distribute keys for data evenly around a data cluster using a hash rank, which while normally efficient puts data from the same time range across large swaths of nodes, making accessing ranges a high-load operation.

Basho’s CEO Adam Wray said that the unique distribution system that Riak TS uses gives users an edge when working with time stamped or other continuous data.

“We optimized the data placement so that specific nodes would get specific ranges of data,” he said, explaining that this placement leads to fewer operations to fetch data from a certain time range.

While individual developers will certainly benefit from the REST API that the new release provides, it’s Riak TS’ compatibility with existing SQL database commands that Basho thinks will make it a hit in the enterprise space.

“It’s normal SQL [commands], not some CQL or some variant of our own of SQL,” said Dave McCrory, Basho’s Chief Technology Officer. “We support the traditional operations that people care about.”

While supporting the most common of SQL operations certainly will be attractive amongst legacy users and many enterprises, the custom solutions that many enterprise users build in-house on top of SQL platforms might stymie Riak TS’ widespread adoption in the enterprise space.

Nodes in Riak can be distributed across both virtualized and physical machines, as well as cloud instances on platforms like Microsoft’s Azure or Amazon Web Services.

Basho claims that the Riak TS system operates over 50% faster on time-series data compared to competitors like the Apache Software Foundation’s open-source noSQL Cassandra database. TechCrunch has not independently verified these statistics, but with today’s open-source release, the performance gains of the Riak TS system would be visible to users.

It remains to be seen if the system’s inbuilt resiliency is enough to sway enterprise users. Each cluster stores three identical copies of every bit that it contains, and multi-cluster replication can increase this number a theoretically infinite number of times. At scale, this leads to pretty high uptime and low fault rates, but at a cost that many smaller users might not be willing to bear in terms of system size.