Google Cloud provides public datasets that can be
easily accessed and queried in BigQuery Studio. These datasets are useful for
data analysis, machine learning, and business intelligence. Below is a detailed
guide on how to search, add, and use a public dataset in BigQuery Studio.
Step 1: Open BigQuery Studio
Log in to Google Cloud Console.
Navigate to BigQuery Studio.
In the Explorer panel on the left, click the + ADD button.
A pop-up window will appear, allowing you to add datasets.
Step 2: Search for Public Datasets
In the search box, type "public" and press Enter.
From the results, select Public Datasets. This will redirect you to the Google Cloud Marketplace, where various public datasets are listed.
Step 3: Select a Dataset from Google Cloud Marketplace
Browse through the available public datasets in the Marketplace.
Select a dataset of your choice based on your needs (e.g., climate data, healthcare data, financial records, etc.).
Click on the dataset to open its Product Details page.
Step 4: Review Dataset Details
The Product Details section provides information such as:
· Dataset description
· Schema and table structure
· Usage policies and licensing
Review the details to ensure the dataset fits your requirements.
Step 5: Add the Dataset to BigQuery Studio
Click the View Dataset.
This will import the dataset into BigQuery Studio and make it available for querying. The dataset will now appear under your project in the Explorer panel.
You can observe from above screen, all the datasets are imported to the project ‘bigquery-public-data’
Step 6: Start Querying the Public Dataset
Once the dataset is added, click on a table to view its structure.
Use the SQL Query Editor to start analyzing the data.
Run SQL queries directly on the dataset to extract insights.
You can execute the queries as well.
SELECT * FROM `bigquery-public-data.baseball.games_post_wide` LIMIT 10;
This SQL query retrieves all columns from the games_post_wide table, which is part of the bigquery-public-data.baseball dataset. It limits the result to 10 rows.
In the above query
· bigquery-public-data: It is a Google Cloud public dataset project. Google Cloud hosts various free datasets under this project.
· baseball (Dataset): baseball is a dataset inside the bigquery-public-data project.
· games_post_wide (Table): games_post_wide is a table within the baseball dataset.
Previous Next Home
No comments:
Post a Comment