When working with large datasets in Google BigQuery, using partitioned tables is a great way to improve performance and reduce costs. But as your data grows, it's also important to manage the storage lifecycle of those partitions. That’s where the partition_expiration_days option comes in.
In this post, we’ll walk you through what it is, how it works, and how to use it effectively to manage your data.
What is partition_expiration_days?
partition_expiration_days is an option in BigQuery that lets you set how long each partition should live after its creation. Once the specified number of days passes, that partition is automatically deleted.
This helps you:
· Keep your data storage clean and cost-effective.
· Avoid manual deletion of old data.
· Stay compliant with data retention policies.
How It Works (with Precedence)?
BigQuery gives you multiple levels where you can define expiration for partitions:
· Partition-level expiration: Most specific and highest priority.
· Table-level expiration (partition_expiration_days): Applies to all partitions unless overridden by the partition itself.
· Dataset-level default: Lowest priority.
If no expiration is set at any of these levels, partitions will never expire unless you delete them manually.
Following table summarizes BigQuery Expiration Behavior Table.
Dataset Expiration |
Table Expiration |
Partition Expiration |
Resulting Behavior |
Not set |
Not set |
Not set |
Partitions never expire. Manual deletion is required. |
10 days |
Not set |
Not set |
Table expires after 10 days due to dataset expiration. All partitions deleted with table. |
10 days |
5 days |
Not set |
Table expires in 5 days. All partitions deleted with table (table expiration overrides dataset). |
Not set |
5 days |
10 days |
Table expires in 5 days. All partitions deleted with the table, despite longer partition expiration. |
Not set |
Not set |
7 days |
Each partition expires individually 7 days after creation. Table remains intact. |
30 days |
Not set |
10 days |
Table expires in 30 days (from dataset). Partitions expire in 10 days. |
20 days |
3 days |
15 days |
Table expires in 3 days. Partitions deleted with table (table expiration takes precedence). |
Not set |
10 days |
5 days |
Partitions expire 5 days after creation. Table expires in 10 days. |
60 days |
7 days |
Not set |
Table expires in 7 days. All partitions deleted with table. |
How to set expiry at dataset?
At the time of dataset creation, you can set the expiration under Advanced Options -> Default table expiration section.
Or you can set the same using ALTER statement as well.
Set Default Expiration at Dataset Level
ALTER SCHEMA `my_project.my_dataset` SET OPTIONS ( default_table_expiration_days = 30 );
Set expiration for a table
CREATE TABLE `my_project.my_dataset.my_table` ( id INT64, name STRING ) OPTIONS ( expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) );
Set Expiration on Existing Table (SQL)
ALTER TABLE `my_project.my_dataset.my_table` SET OPTIONS ( expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 10 DAY) );
To remove expiration
ALTER TABLE `my_project.my_dataset.my_table` SET OPTIONS (expiration_timestamp = NULL);
How to set the partition expiration of a table?
You can set expiration during or after table creation.
ALTER TABLE `my_project.my_dataset.table` SET OPTIONS ( partition_expiration_days = 7 );
Previous Next Home
No comments:
Post a Comment