Tuesday, 10 June 2025

BigQuery's partition_expiry_days Explained

When working with large datasets in Google BigQuery, using partitioned tables is a great way to improve performance and reduce costs. But as your data grows, it's also important to manage the storage lifecycle of those partitions. That’s where the partition_expiration_days option comes in.

In this post, we’ll walk you through what it is, how it works, and how to use it effectively to manage your data.

 

What is partition_expiration_days?

partition_expiration_days is an option in BigQuery that lets you set how long each partition should live after its creation. Once the specified number of days passes, that partition is automatically deleted.

 

This helps you:

·      Keep your data storage clean and cost-effective.

·      Avoid manual deletion of old data.

·      Stay compliant with data retention policies.

 

How It Works (with Precedence)?

BigQuery gives you multiple levels where you can define expiration for partitions:

 

·      Partition-level expiration: Most specific and highest priority.

·      Table-level expiration (partition_expiration_days): Applies to all partitions unless overridden by the partition itself.

·      Dataset-level default: Lowest priority.

 

If no expiration is set at any of these levels, partitions will never expire unless you delete them manually.

 

Following table summarizes BigQuery Expiration Behavior Table.

 

Dataset Expiration

Table Expiration

Partition Expiration

Resulting Behavior

Not set

Not set

Not set

Partitions never expire. Manual deletion is required.

10 days

Not set

Not set

Table expires after 10 days due to dataset expiration. All partitions deleted with table.

10 days

5 days

Not set

Table expires in 5 days. All partitions deleted with table (table expiration overrides dataset).

Not set

5 days

10 days

Table expires in 5 days. All partitions deleted with the table, despite longer partition expiration.

Not set

Not set

7 days

Each partition expires individually 7 days after creation. Table remains intact.

30 days

Not set

10 days

Table expires in 30 days (from dataset). Partitions expire in 10 days.

20 days

3 days

15 days

Table expires in 3 days. Partitions deleted with table (table expiration takes precedence).

Not set

10 days

5 days

Partitions expire 5 days after creation. Table expires in 10 days.

60 days

7 days

Not set

Table expires in 7 days. All partitions deleted with table.

 

How to set expiry at dataset?

At the time of dataset creation, you can set the expiration under Advanced Options -> Default table expiration section.

 


Or you can set the same using ALTER statement as well.

 

Set Default Expiration at Dataset Level

ALTER SCHEMA `my_project.my_dataset`
SET OPTIONS (
  default_table_expiration_days = 30
);

Set expiration for a table

CREATE TABLE `my_project.my_dataset.my_table`
(
  id INT64,
  name STRING
)
OPTIONS (
  expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
);

Set Expiration on Existing Table (SQL)

ALTER TABLE `my_project.my_dataset.my_table`
SET OPTIONS (
  expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 10 DAY)
);

To remove expiration

ALTER TABLE `my_project.my_dataset.my_table`
SET OPTIONS (expiration_timestamp = NULL);

How to set the partition expiration of a table?

You can set expiration during or after table creation.

ALTER TABLE `my_project.my_dataset.table`
SET OPTIONS (
  partition_expiration_days = 7
);

  

Previous                                                    Next                                                    Home

No comments:

Post a Comment