Google today announced that Dataset Search, a service that lets you search for close to 25 million different publicly available data sets, is now out of beta. Dataset Search first launched in September 2018.
Researchers can use these data sets, which range from pretty small ones that tell you how many cats there were in the Netherlands from 2010 to 2018 to large annotated audio and image sets, to check their hypotheses or train and test their machine learning models. The tool currently indexes about 6 million tables.
With this release, Dataset Search is getting a mobile version and Google is adding a few new features to Dataset Search. The first of these is a new filter that lets you choose which type of data set you want to see (tables, images, text, etc.), which makes it easier to find the right data you’re looking for. In addition, the company has added more information about the data sets and the organizations that publish them.
A lot of the data in the search index comes from government agencies. In total, Google says, there are about 2 million U.S. government data sets in the index right now. But you’ll also regularly find Google’s own Kaggle show up, as well as a number of other public and private organizations that make public data available, as well.
As Google notes, anybody who owns an interesting data set can make it available to be indexed by using a standard schema.org markup to describe the data in more detail.