Wednesday, 14 May 2025

How to Add Public Datasets to BigQuery Studio for Easy Data Exploration?

Google Cloud provides public datasets that can be easily accessed and queried in BigQuery Studio. These datasets are useful for data analysis, machine learning, and business intelligence. Below is a detailed guide on how to search, add, and use a public dataset in BigQuery Studio.

Step 1: Open BigQuery Studio

Log in to Google Cloud Console.

Navigate to BigQuery Studio.

In the Explorer panel on the left, click the + ADD button.

 


A pop-up window will appear, allowing you to add datasets.


Step 2: Search for Public Datasets

In the search box, type "public" and press Enter. 


 

From the results, select Public Datasets. This will redirect you to the Google Cloud Marketplace, where various public datasets are listed.  



Step 3: Select a Dataset from Google Cloud Marketplace

Browse through the available public datasets in the Marketplace.

Select a dataset of your choice based on your needs (e.g., climate data, healthcare data, financial records, etc.).

Click on the dataset to open its Product Details page.

 

Step 4: Review Dataset Details

The Product Details section provides information such as:

·      Dataset description

·      Schema and table structure

·      Usage policies and licensing

Review the details to ensure the dataset fits your requirements.

 

Step 5: Add the Dataset to BigQuery Studio

Click the View Dataset. 


 

This will import the dataset into BigQuery Studio and make it available for querying. The dataset will now appear under your project in the Explorer panel.

 


You can observe from above screen, all the datasets are imported to the project ‘bigquery-public-data’

 

Step 6: Start Querying the Public Dataset

Once the dataset is added, click on a table to view its structure.

Use the SQL Query Editor to start analyzing the data.

Run SQL queries directly on the dataset to extract insights.

 


You can execute the queries as well.

 

SELECT * FROM `bigquery-public-data.baseball.games_post_wide` LIMIT 10;

 

This SQL query retrieves all columns from the games_post_wide table, which is part of the bigquery-public-data.baseball dataset. It limits the result to 10 rows.

 


In the above query

·      bigquery-public-data: It is a Google Cloud public dataset project. Google Cloud hosts various free datasets under this project.

 

·      baseball (Dataset): baseball is a dataset inside the bigquery-public-data project.

 

·      games_post_wide (Table): games_post_wide is a table within the baseball dataset.

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment