When working with BigQuery, data often comes in different formats, and converting data types is a common task. However, using CAST directly can lead to errors if the data is not in the expected format. This guide introduces SAFE_CAST, a safer alternative that prevents query failures by replacing conversion errors with NULL.
This tutorial walks you through:
· Creating a table in BigQuery
· Inserting sample data (including incorrect values)
· Using SAFE_CAST to handle invalid conversions safely
· Understanding when to use SAFE_CAST vs. CAST
By the end of this guide, you’ll know how to perform type conversions in BigQuery without worrying about query failures.
Using SAFE_CAST in BigQuery
The SAFE_CAST function in BigQuery is used to convert one data type to another without causing errors. If the conversion fails, instead of throwing an error, SAFE_CAST replace the value with NULL. This is useful when working with datasets that may have inconsistent or incorrect data formats.
Syntax
SAFE_CAST(expression AS target_data_type)
Here,
· expression: The value to be converted.
· target_data_type: The data type to which the value should be converted.
Step-by-Step Guide
Step 1: Create a Table
Let's create a table named employee_data with columns of different data types.
CREATE TABLE my_project.my_dataset.employee_data ( emp_id STRING, age STRING, -- Storing age as a string (to demonstrate SAFE_CAST) salary STRING -- Storing salary as a string (to demonstrate SAFE_CAST) );
Note: We are using STRING type for age and salary to simulate real-world scenarios where data may be stored incorrectly.
Step 2: Insert Sample Records
We will insert some records, including some invalid values that cannot be converted to integers or floats.
INSERT INTO my_project.my_dataset.employee_data (emp_id, age, salary) VALUES ('E101', '30', '50000.75'), -- Valid age and salary ('E102', 'Twenty-five', '60000'), -- Invalid age (non-numeric string) ('E103', '40', 'Invalid'), -- Invalid salary (non-numeric string) ('E104', '35', '70000.50'); -- Valid age and salary
Let’s print employee_data content by executing below command.
SELECT * FROM my_project.my_dataset.employee_data;
Step 3: Using SAFE_CAST to Convert Data Types
Now, we will use SAFE_CAST to safely convert age and salary to INT64 and FLOAT64, respectively.
SELECT emp_id, SAFE_CAST(age AS INT64) AS age_converted, SAFE_CAST(salary AS FLOAT64) AS salary_converted FROM my_project.my_dataset.employee_data;
You can observe following output.
As you see above result,
· The valid values are converted correctly.
· The invalid values are replaced with NULL instead of throwing errors.
When to Use SAFE_CAST?
· Handling Invalid Data: Prevents errors when encountering incorrect data formats.
· Data Cleaning: Helps in processing raw data without stopping execution.
· Working with External Data Sources: Useful when data comes from external sources with unknown formats.
Comparison: CAST vs. SAFE_CAST
Function |
Behavior |
CAST(value AS INT64) |
Throws an error if conversion fails. |
SAFE_CAST(value AS INT64) |
Returns NULL if conversion fails. |
In summary,
· SAFE_CAST is a safer alternative to CAST.
· It avoids query failures by replacing conversion errors with NULL.
· It is useful for cleaning and transforming data in BigQuery.
Previous Next Home
No comments:
Post a Comment