1. What is Ingestion-Time Partitioning?
In BigQuery, ingestion-time partitioning automatically divides a table into partitions based on the date and time the data that was loaded into BigQuery.
Unlike column-based partitioning (where you specify a column like order_date), ingestion-time partitioning happens automatically without needing a date column.
This is useful for log data, event streams, and real-time data ingestion where records arrive at different times.
For example, imagine you're storing logs from a web application. Instead of scanning an entire log table, ingestion-time partitioning lets you query only the logs that arrived on a specific date.
2. Key System-Defined Columns: _PARTITIONTIME & _PARTITIONDATE
BigQuery automatically creates special columns when you use ingestion-time partitioning:
· _PARTITIONTIME
· _PARTITIONDATE
2.1 _PARTITIONTIME (Timestamp-Based Partitioning)
· This is a hidden pseudo column that stores the exact timestamp when data was loaded into BigQuery.
· It is stored as a TIMESTAMP type.
· Queries can filter by _PARTITIONTIME to retrieve only relevant partitions.
Follow below step-by-step procedure to understand this better.
Step 1: Create a Partitioned Table on _PARTITIONTIME
We will create a partitioned table using _PARTITIONTIME so that BigQuery automatically assigns partitions based on the ingestion time.
CREATE TABLE my_project.my_dataset.partitioned_time_table ( id INT64, name STRING ) PARTITION BY DATE(_PARTITIONTIME);
You can partition on the timestamp (_PARTITIONTIME) directly, convert it to Date.
When you check the table details section, you can see below metadata.
Step 2: Let’s insert some data into the table.
INSERT INTO my_project.my_dataset.partitioned_time_table (id, name) VALUES (1, "Ram"), (2, "Krishna"), (3, "Chamu");
Step 3: Querying _PARTITIONTIME and _PARTITIONDATE
Now, let's check what _PARTITIONTIME and _PARTITIONDATE look like:
SELECT id, name, _PARTITIONDATE, _PARTITIONTIME FROM my_project.my_dataset.partitioned_time_table;
Step 4: Querying Specific Partitions
Query a Specific Day's Partition (Efficient).
SELECT * FROM my_project.my_dataset.partitioned_time_table WHERE _PARTITIONDATE = "2025-04-03";
You can see following results.
Query Based on _PARTITIONTIME (Exact Time Filtering)
SELECT * FROM my_project.my_dataset.partitioned_time_table WHERE _PARTITIONTIME BETWEEN TIMESTAMP("2025-04-03 00:00:00 UTC") AND TIMESTAMP("2025-04-03 23:59:59 UTC");
2.2 Create a Table Partitioned by _PARTITIONDATE
To create an ingestion-time partitioned table, you need to partition by _PARTITIONDATE, which is the DATE version of _PARTITIONTIME.
Follow below step-by-step procedure to understand this better.
Step 1: Create partitioned_date_table by executing below statement.
CREATE TABLE my_project.my_dataset.partitioned_date_table ( id INT64, name STRING ) PARTITION BY _PARTITIONDATE;
Step 2: Insert Data into the Partitioned Table
You do not insert _PARTITIONDATE. BigQuery assigns it automatically.
INSERT INTO my_project.my_dataset.partitioned_date_table (id, name) VALUES (1, "Ram"), (2, "Krishna"), (3, "Chamu");
Step 3: Query Data with _PARTITIONTIME and _PARTITIONDATE
You can now retrieve partition metadata:
SELECT id, name, _PARTITIONDATE, _PARTITIONTIME FROM my_project.my_dataset.partitioned_date_table ;
Above snippet return following result.
Key Differences
Feature |
_PARTITIONTIME |
_PARTITIONDATE |
Type |
TIMESTAMP |
DATE |
Precision |
Full timestamp |
Truncated to YYYY-MM-DD |
Partitioning |
Cannot partition by this |
Used for partitioning |
Query Speed |
May scan full table Faster |
scans only partitions |
Use Case |
Precise time-based filtering |
Efficient partition pruning |
In summary,
· Use _PARTITIONDATE for query optimization (partition pruning).
· Use _PARTITIONTIME for fine-grained time filtering when needed.
Previous Next Home
No comments:
Post a Comment