Google said its approach is based on an open standard for describing this information
Google has developed a search engine to allow researchers, scientists and journalists to find the data required for their work more easily.
Dataset Search aims to provide access to “millions of datasets” from many thousands of data repositories on the web in addition to the information published by local and national governments around the world.
Similar to how Google Scholar works, the new functionality is outlined in a blog post published to coincide with the launch. “Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher’s site, a digital library, or an author’s personal web page,” writes Natasha Noy, a research scientist at Google AI, who was involved in the tool’s development.
To create Dataset Search, Google developed guidelines for dataset providers to describe their data in a way that it (and other search engines) can better understand the content of their pages. These guidelines include “salient” information about datasets: who created the dataset, when it was published, how the data was collected, and what the terms are for using the data.
This information is collected and linked, analysed where different versions of the same dataset might be, and publications found that may be describing or discussing the dataset. Google said its approach is based on an open standard for describing this information (schema.org) and anybody who publishes data can describe their dataset this way.
"Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher’s site, a digital library, or an author’s personal web page"
“We encourage dataset providers, large and small, to adopt this common standard so that all datasets are part of this robust ecosystem,” continued Noy.
In this new release, references to most datasets in environmental and social sciences, as well as data from other disciplines including government data and data provided by news organisations, such as ProPublica can be found.
Data from Nasa and NOAA as well as from academic repositories such Harvard’s Dataverse and Inter-university Consortium for Political and Social Research (ICPSR) can also be accessed.
As more data repositories use the schema.org standard to describe their datasets, the variety and coverage of datasets that users will find in Dataset Search, will continue to grow.
“A search tool like this one is only as good as the metadata that data publishers are willing to provide,” Noy concluded. “We hope to see many of you use the open standards to describe your data, enabling our users to find the data that they are looking for.”
If you like this, you might be interested in reading the following:
Smart city data tool to inform decision-making
Aggregated data is collected from more than one million vehicles equipped with Geotab telematics devices
Boston’s open data platform wants users to ‘Analyse’ the city
OpenGov is the world’s first integrated cloud solution for public sector budgeting, reporting, and open data
Denton opens up to embrace smart government
The tool will be integrated with OpenGov’s other offerings to help the city increase public trust and facilitate civic action