A New Search Engine – Google Dataset Search
This Wednesday (Sep 5, 2018), Google launched a new search engine named Google Dataset Search. Dataset search allows users to search for datasets multiple repositories present on the Web.
Much like Kaggle, where you can find and publish high-quality datasets to help you in exploring and building models or science projects with that database.
You can connect and learn with many data scientists and machine learning engineers about various problems and new approaches.
In a recent post by Natasha Roy, a Research Scientist in Google AI says,
“Google Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher’s site, a digital library, or an author’s personal web page.”
It is very similar to how ‘Google Scholar’ works.
Reach and Transparency –
So, In the new Google data engine, the search presents you the access to these universal datasets including the data from Kaggle, NASA and NOAA spread across the internet.
You can search for any type of data like AI, environmental, social science, documental, civic & government data, or more, and all the available information and data will display.
This is different from a typical search engine as in many cases. Information on specific datasets is many times neither linked and nor indexed.
That makes the discovery of data vague and tedious. That is why Google came up with a single interface for just the dataset discovery.
Furthermore, it gives the data publishers a search Engine of their own to create a better information sharing ecosystem.
Also helps them in following best data storage and publication practices. It’s an excellent way for scientists to show the impact of their work through self-produced citation of datasets, by making their database searchable and visible to the users.
Phase and Approach-
The Google Dataset Search engine is currently in Beta. So, right now you may not find rich results for datasets yet.
However, we are expecting more things and better results in the future of this search engine. As of right now, experiments on new approaches are ongoing for Search engine’s better range, accuracy, visibility and scalability.
The Google dataset search engine understands the structured data in web pages by either using dataset schema.org or similar structure present in DCAT (Data Catalog Vocabulary) format of W3C.
Moreover, for the search engine tool to provide with the correct database, a publisher needs to offer explicit metadata.
Describe your data accordingly to enable the searchers to find them. Provide the dataset with correct and accurate supporting information like their name, creator, data nature description and structured-data format.
If you need a clearer understanding of how you can qualify your dataset for optimum visibility on the search engine, you may want to visit the Google’s developers site where you can find guidelines, ask questions and give your feedback.