Google BigQuery is getting a number of new updates to make it easier to analyze large amounts of data quickly and at a lower price. BigQuery is designed to process terabytes of data, and today’s updates should provide a greater degree of flexibility for ad hoc analysis of extremely large datasets and allow for more sophisticated analysis.
There are six new features overall: Unlimited query outputs on the processing of terabytes of data; advanced analytics on windows functions; better caching to save time on the cost of recalculating a query; instant information on the cost of a query; a drop in storage costs; and support for larger workloads that doubles interactive query quotas for all users.
In terms of analytics, BigQuery’s windows functions now give customers some new “ways to rank results, explore distributions and percentiles, and traverse results without the need for a self join.”
Cost is a factor when data is processed in such large amounts. The new user interface feature helps the user manage this a bit better. When the syntax is correct in a query, the UI will inform the customer how much the query would cost to run.
The cost of storage will also decrease from $0.12 per gigabyte per month to $0.08 per month. High-volume users will also soon have the ability to opt-in for tiered query pricing.
BigQuery is meant for processing billions of rows of data. It’s based on Google Dremel, a real-time ad hoc query system that has surpassed Hadoop’s analysis capabilities.
There is a growing movement behind open-source versions of Dremel. Apache Drill is an open-source version of the Dremel technology. Cloudera has also launched Impala, a real-time query engine that is also open source. In February, startup Citus Data launched CitusDB for Hadoop, a service that can process petabytes of data within seconds.