Wednesday, 4 June 2025

SAFE_CAST in BigQuery to perform Error-Free Data Conversion

When working with BigQuery, data often comes in different formats, and converting data types is a common task. However, using CAST directly can lead to errors if the data is not in the expected format. This guide introduces SAFE_CAST, a safer alternative that prevents query failures by replacing conversion errors with NULL. 

This tutorial walks you through:

·      Creating a table in BigQuery

·      Inserting sample data (including incorrect values)

·      Using SAFE_CAST to handle invalid conversions safely

·      Understanding when to use SAFE_CAST vs. CAST

 

By the end of this guide, you’ll know how to perform type conversions in BigQuery without worrying about query failures.

 

Using SAFE_CAST in BigQuery

The SAFE_CAST function in BigQuery is used to convert one data type to another without causing errors. If the conversion fails, instead of throwing an error, SAFE_CAST replace the value with NULL. This is useful when working with datasets that may have inconsistent or incorrect data formats.

 

Syntax

SAFE_CAST(expression AS target_data_type)

Here,

·      expression: The value to be converted.

·      target_data_type: The data type to which the value should be converted.

 

Step-by-Step Guide

Step 1: Create a Table

Let's create a table named employee_data with columns of different data types.

CREATE TABLE my_project.my_dataset.employee_data (
    emp_id STRING,
    age STRING,  -- Storing age as a string (to demonstrate SAFE_CAST)
    salary STRING  -- Storing salary as a string (to demonstrate SAFE_CAST)
);

Note: We are using STRING type for age and salary to simulate real-world scenarios where data may be stored incorrectly.

 

Step 2: Insert Sample Records

We will insert some records, including some invalid values that cannot be converted to integers or floats.

INSERT INTO my_project.my_dataset.employee_data (emp_id, age, salary)
VALUES
    ('E101', '30', '50000.75'),    -- Valid age and salary
    ('E102', 'Twenty-five', '60000'), -- Invalid age (non-numeric string)
    ('E103', '40', 'Invalid'),      -- Invalid salary (non-numeric string)
    ('E104', '35', '70000.50');     -- Valid age and salary

 

Let’s print employee_data content by executing below command.

 

SELECT * FROM my_project.my_dataset.employee_data;

 


Step 3: Using SAFE_CAST to Convert Data Types

Now, we will use SAFE_CAST to safely convert age and salary to INT64 and FLOAT64, respectively.

 

SELECT 
    emp_id,
    SAFE_CAST(age AS INT64) AS age_converted,
    SAFE_CAST(salary AS FLOAT64) AS salary_converted
FROM my_project.my_dataset.employee_data;

 

You can observe following output.

 


 

As you see above result,

·      The valid values are converted correctly.

·      The invalid values are replaced with NULL instead of throwing errors.

 

When to Use SAFE_CAST?

·      Handling Invalid Data: Prevents errors when encountering incorrect data formats.

·      Data Cleaning: Helps in processing raw data without stopping execution.

·      Working with External Data Sources: Useful when data comes from external sources with unknown formats.

 

Comparison: CAST vs. SAFE_CAST

Function

Behavior

CAST(value AS INT64)   

Throws an error if conversion fails.

SAFE_CAST(value AS INT64)  

Returns NULL if conversion fails.

 

In summary,

·      SAFE_CAST is a safer alternative to CAST.

·      It avoids query failures by replacing conversion errors with NULL.

·      It is useful for cleaning and transforming data in BigQuery.

 


  

Previous                                                    Next                                                    Home

No comments:

Post a Comment