Partitioning in BigQuery helps to improve query performance and manage costs by dividing a large table into smaller partitions based on a specific column.
BigQuery supports date-based partitioning using:
· DATE column partitioning: Based on a column of type DATE.
· TIMESTAMP or DATETIME column partitioning: Uses a column of type TIMESTAMP or DATETIME, but it is converted internally to DATE while partitioning.
· Ingestion-time partitioning: Automatically partitions based on the DATE of data ingestion.
Follow below step-by-step procedure to understand “How can we partition the table by date column?”.
Step 1: Create a Partitioned Table
We'll create a table sales_partitioned partitioned by the order_date column.
CREATE OR REPLACE TABLE my_project.my_dataset.sales_partitioned ( order_id STRING, customer_name STRING, order_date DATE, order_amount FLOAT64 ) PARTITION BY order_date;
Go to table details section, you can confirm that the table is partitioned on order_date field.
Step 2: Let’s insert some records into it.
INSERT INTO my_project.my_dataset.sales_partitioned (order_id, customer_name, order_date, order_amount) VALUES ('O1001', 'Aarav', DATE '2024-03-15', 250.75), -- Valid ('O1002', 'Bhavya', NULL, 120.00), -- NULL order_date ('O1003', 'Charan', DATE '9100-02-23', 310.50), -- Out-of-range year ('O1004', 'Divya', DATE '1990-05-10', 500.25), -- Old date before partitions ('O1005', 'Esha', DATE '2025-07-21', 600.40), -- Future date ('O1006', 'Farhan', DATE '2023-12-31', 220.00), -- Valid end-of-year date ('O1007', 'Gaurav', DATE '2020-02-29', 430.10), -- Leap year date ('O1008', 'Hina', DATE '2022-06-10', 350.25), -- Valid ('O1009', 'Ishaan', DATE '2000-01-01', 150.50), -- Old date ('O1010', 'Jiya', DATE '9999-12-31', 890.75), -- Extreme future date ('O1011', 'Kunal', NULL, 290.00), -- NULL order_date ('O1012', 'Lavanya', DATE '2030-08-15', 120.30),-- Future date (Independence Day) ('O1013', 'Manoj', DATE '2015-04-01', 400.00), -- Old but recent date ('O1014', 'Neha', DATE '2026-11-05', 340.60), -- Valid future date ('O1015', 'Omkar', DATE '1750-12-10', 510.90), -- Extremely old date (invalid for practical use) ('O1016', 'Pooja', DATE '2077-01-01', 600.10), -- Future date ('O1017', 'Qasim', DATE '2024-04-20', 120.75), -- Valid ('O1018', 'Riya', DATE '2011-07-07', 250.00), -- Valid past date ('O1019', 'Sohan', DATE '9000-05-05', 999.99), -- Another out-of-range date ('O1020', 'Tanvi', DATE '2027-09-09', 875.25), -- Valid future date ('O1021', 'Umesh', DATE '2024-01-26', 450.60), -- Republic Day India ('O1022', 'Vanya', DATE '1995-08-24', 320.10), -- Old date ('O1023', 'Wasim', DATE '2050-10-31', 710.55), -- Future Halloween ('O1024', 'Xavier', DATE '1800-06-15', 230.20), -- Very old date ('O1025', 'Yash', NULL, 540.00), -- NULL order_date ('O1026', 'Zoya', DATE '2024-02-14', 660.90); -- Valentine's Day
Step 3: Querying with Partition Filters (Best Practice)
When querying partitioned tables, try to use filters on the partitioned column to get better performance.
SELECT * FROM my_project.my_dataset.sales_partitioned WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';
You can get following results.
Let’s query ‘INFORMATION_SCHEMA.PARTITIONS’ for the table 'sales_partitioned' to confirm partitions.
SELECT partition_id, total_rows FROM my_project.my_dataset.INFORMATION_SCHEMA.PARTITIONS WHERE table_name = 'sales_partitioned';
Above query return following response.
Everything looks fine, but what about two special partitions
a. __UNPARTITIONED__
b. __NULL__
__UNPARTITIONED__ Partition (5 records)
This partition contains records that don't fit into any of your defined partition ranges. Based on the sample data, these would be records with:
· Dates that are outside the valid range for DATE partitioning (like year 9100 or 9000)
· Dates that are extremely old (like 1750 or 1800) if your partition range doesn't cover them
· Dates that might be valid DATE types but fall outside your partition boundaries
In the sample data, the 5 records that would go to __UNPARTITIONED__ are likely:
· 'O1003' (DATE '9100-02-23') - Invalid future date
· 'O1015' (DATE '1750-12-10') - Extremely old date
· 'O1019' (DATE '9000-05-05') - Invalid future date
· 'O1024' (DATE '1800-06-15') - Very old date
· 'O1010' (DATE '9999-12-31') - Extreme future date
__NULL__ Partition (3 records)
This partition contains all records where the partition column (order_date) is NULL. In the sample data, these are clearly:
· 'O1002' (NULL order_date)
· 'O1011' (NULL order_date)
· 'O1025' (NULL order_date)
BigQuery creates these special partitions to:
· Maintain data integrity (not lose data that doesn't fit your partitioning scheme)
· Provide visibility into problematic or edge-case records
· Allow you to query these records separately if needed
No comments:
Post a Comment