Google bought Kaggle in 2017 to provide a data science community for its big data processing tools on Google Cloud. Today, the company announced a new direct integration between Kaggle and BigQuery, Google’s cloud data warehouse.
More specifically, data scientists can build a model in a Kaggle Jupyter Notebook, known as Kaggle Kernels in the community. You can then link directly to BigQuery through the tool’s API, making it much simpler to query against the data in the data warehouse using SQL, a language data scientists tend to be very familiar with.
The benefit of this approach, according to Google, is that you don’t have to actually move or download the data to query it or perform machine learning on it. “Once your Google Cloud account is linked to a Kernels notebook or script, you can compose queries directly in the notebook using the BigQuery API Client library, run it against BigQuery, and do almost any kind of analysis from there with the data,” Google wrote in a blog post introducing the integration.
Data scientists, who have a particular way of working, get to work in a familiar fashion and it reduces the friction involved in building a model and conducting machine learning against it. Instead of moving back and forth between tools, you can do all your work in a smoother, more integrated way and it should save time and effort in the long run.
What’s more, because Kaggle is a public community of data scientists, you can share Kernels should you choose to do so. Conversely, you can search the public repository and use existing Kernels as a starting point or as a reference to experiment with different types of data sets.
The Kaggle community also provides a means to discuss issues with other data scientists in an open way. The community has 3 million users and there are currently 200,000 Kernels available to explore in the public repository.