Sqrrl, a big data startup with links to the NSA, announced $5.2 million in Series A funding from existing investors Atlas Venture and Matrix Partners, which will be used to further fund the development and commercialization of its scalable, NoSQL database “Sqrrl Enterprise.” An updated version of this product (version 1.2) is also shipping today with additional analytic, security and performance features and improvements, the company says.
The company’s history is interesting, especially given the far-reaching implications of the PRISM reveal and the extent to which the U.S. government and others have been sweeping up and analyzing its citizens’ data. Sqrrl is powered by Apache Accumulo, a project that got its start back in 2008 when the NSA began to look for a data store that could meet its growing needs to store, secure and analyze large data sets more economically.
As Sqrrl explains on its site, the NSA requirements included:
- Elastic scalability to petabytes of data
- Flexible schemas that could easily accommodate unstructured or semi-structured data
- Fine-grained security controls that would enable the mixing of data with different security requirements
- The ability to run on inexpensive, commodity hardware to minimize costs
- Very fast read and write access to the data
- Rapid application development for big data analytics
When no existing product met its need, the NSA built one itself with some help from Google’s paper about Bigtable. The new data store is called Accumulo, and was later adopted by the Department of Defense and the U.S. Intelligence Community and by various companies. In 2011, Accumulo was open sourced, and in spring 2012 it became a top-level project at the Apache Foundation. That summer, a core group of its creators, committers and contributors formed Sqrrl to allow organizations and businesses to have secure, scalable and easily adaptable ways to manage their own big data stores.
When the company raised its $2 million seed round last summer, the company founders explained how Sqrrl’s “cell level security” is compliant with regulations like HIPAA and Sarbanes–Oxley that could otherwise get in the way of performing big data analysis.
Instead of breaking down data into secure chunks, then analyzing it, Sqrrl takes the different elements of the data and makes them secure down to the cell level. In layman’s terms, a way to understand cell level security is to think about adding permissions on an Excel spreadsheet. While you could secure the folder containing the file, or the file itself, Sqrrl would allow you to protect each individual cell in that spreadsheet with different permissions. In a health record, that means that individual elements like age, gender or diagnosis, for example, are each secured and can then be combined in any number of ways when performing big data analysis, while still maintaining compliance.
For obvious reasons, this granular control is why Sqrrl makes sense for organizations with security needs, like government, financial services, healthcare, telecommunications and energy industries.
Sqrrl’s software, built on Apache Accumulo, adds functionality and a user-interface layer on top of a Hadoop-based database. Apache Accumulo itself competes with HBase, another database built on Hadoop for big data analysis. (Startups like Drawn to Scale are building their solutions on HBase, we previously noted).
Meanwhile, Accumulo’s acceptance is growing. Last month, Cloudera announced support for Accumulo, which Sqrrl said was a “validation of the unique features of Accumulo, including its cell-level security capabilities,” and concluded that support from other Hadoop vendors would be soon on the way.
“Sqrrl is filling an important gap in the Big Data market,” noted Antonio Rodriguez, partner at Matrix Partners and a Sqrrl board member in a statement. “As the company focused on helping organizations building secure, multitenant, real-time Big Data applications, Sqrrl is enabling these organizations to bring together diverse datasets with complex security requirements.”
Sqrrl Co-Founder Ely Kahn said in a phone interview that the funding will pay for the doubling of its staff, particularly engineers, for extending the company’s security features and analytics.
With the funding, the company will improve the capabilities of its graph analysis, full-text search and support for SQL-type languages that can be performed using the Sqrrl database.
For security, Sqrrl will further support attribute-based access controls that allow for a finer granularity on who gets access to what data. Sqrrl goes beyond role-based security to offer the capability to provide access according to the time of day, the location and other fine-grained situations. For example, in a banking scenario, only people in certain countries would have access to certain data to prevent data leakage across borders.
This allows Sqrrl to support multi-tenancy in a more powerful way. If there is a large repository of data, there will be different access rights. With attributes, a customer can be more specific about the access itself.
Sqrrl competes with other NoSQL databases and providers such as Wibidata, Continuuity and LucidWorks. But the company’s advantage is in its fine-grained controls and deep connections that it has made through its years in the national security arena.