Sunday, 8 June 2025

Partitioning by Date Columns in BigQuery

Partitioning in BigQuery helps to improve query performance and manage costs by dividing a large table into smaller partitions based on a specific column.

 

BigQuery supports date-based partitioning using:

 

·      DATE column partitioning: Based on a column of type DATE.

·      TIMESTAMP or DATETIME column partitioning: Uses a column of type TIMESTAMP or DATETIME, but it is converted internally to DATE while partitioning.

·      Ingestion-time partitioning: Automatically partitions based on the DATE of data ingestion.

 

Follow below step-by-step procedure to understand “How can we partition the table by date column?”.

 

Step 1: Create a Partitioned Table

We'll create a table sales_partitioned partitioned by the order_date column.

CREATE OR REPLACE TABLE my_project.my_dataset.sales_partitioned (
    order_id STRING,
    customer_name STRING,
    order_date DATE,
    order_amount FLOAT64
)
PARTITION BY order_date;

Go to table details section, you can confirm that the table is partitioned on order_date field. 

 


Step 2: Let’s insert some records into it.

INSERT INTO my_project.my_dataset.sales_partitioned (order_id, customer_name, order_date, order_amount) VALUES
    ('O1001', 'Aarav', DATE '2024-03-15', 250.75),   -- Valid
    ('O1002', 'Bhavya', NULL, 120.00),               -- NULL order_date
    ('O1003', 'Charan', DATE '9100-02-23', 310.50),  -- Out-of-range year
    ('O1004', 'Divya', DATE '1990-05-10', 500.25),   -- Old date before partitions
    ('O1005', 'Esha', DATE '2025-07-21', 600.40),    -- Future date
    ('O1006', 'Farhan', DATE '2023-12-31', 220.00),  -- Valid end-of-year date
    ('O1007', 'Gaurav', DATE '2020-02-29', 430.10),  -- Leap year date
    ('O1008', 'Hina', DATE '2022-06-10', 350.25),    -- Valid
    ('O1009', 'Ishaan', DATE '2000-01-01', 150.50),  -- Old date
    ('O1010', 'Jiya', DATE '9999-12-31', 890.75),    -- Extreme future date
    ('O1011', 'Kunal', NULL, 290.00),               -- NULL order_date
    ('O1012', 'Lavanya', DATE '2030-08-15', 120.30),-- Future date (Independence Day)
    ('O1013', 'Manoj', DATE '2015-04-01', 400.00),  -- Old but recent date
    ('O1014', 'Neha', DATE '2026-11-05', 340.60),   -- Valid future date
    ('O1015', 'Omkar', DATE '1750-12-10', 510.90),  -- Extremely old date (invalid for practical use)
    ('O1016', 'Pooja', DATE '2077-01-01', 600.10),  -- Future date
    ('O1017', 'Qasim', DATE '2024-04-20', 120.75),  -- Valid
    ('O1018', 'Riya', DATE '2011-07-07', 250.00),   -- Valid past date
    ('O1019', 'Sohan', DATE '9000-05-05', 999.99),  -- Another out-of-range date
    ('O1020', 'Tanvi', DATE '2027-09-09', 875.25),  -- Valid future date
    ('O1021', 'Umesh', DATE '2024-01-26', 450.60),  -- Republic Day India
    ('O1022', 'Vanya', DATE '1995-08-24', 320.10),  -- Old date
    ('O1023', 'Wasim', DATE '2050-10-31', 710.55),  -- Future Halloween
    ('O1024', 'Xavier', DATE '1800-06-15', 230.20), -- Very old date
    ('O1025', 'Yash', NULL, 540.00),               -- NULL order_date
    ('O1026', 'Zoya', DATE '2024-02-14', 660.90);  -- Valentine's Day

 

Step 3: Querying with Partition Filters (Best Practice)

When querying partitioned tables, try to use filters on the partitioned column to get better performance.

 

SELECT * FROM my_project.my_dataset.sales_partitioned
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';

You can get following results.

 


 

Let’s query ‘INFORMATION_SCHEMA.PARTITIONS’ for the table 'sales_partitioned' to confirm partitions.

SELECT 
    partition_id,
    total_rows
FROM my_project.my_dataset.INFORMATION_SCHEMA.PARTITIONS
WHERE table_name = 'sales_partitioned';

 

Above query return following response.

 


 

Everything looks fine, but what about two special partitions

a.   __UNPARTITIONED__

b.   __NULL__

 

__UNPARTITIONED__ Partition (5 records)

This partition contains records that don't fit into any of your defined partition ranges. Based on the sample data, these would be records with:

 

·      Dates that are outside the valid range for DATE partitioning (like year 9100 or 9000)

·      Dates that are extremely old (like 1750 or 1800) if your partition range doesn't cover them

·      Dates that might be valid DATE types but fall outside your partition boundaries

 

In the sample data, the 5 records that would go to __UNPARTITIONED__ are likely:

 

·      'O1003' (DATE '9100-02-23') - Invalid future date

·      'O1015' (DATE '1750-12-10') - Extremely old date

·      'O1019' (DATE '9000-05-05') - Invalid future date

·      'O1024' (DATE '1800-06-15') - Very old date

·      'O1010' (DATE '9999-12-31') - Extreme future date

 

__NULL__ Partition (3 records)

This partition contains all records where the partition column (order_date) is NULL. In the sample data, these are clearly:

 

·      'O1002' (NULL order_date)

·      'O1011' (NULL order_date)

·      'O1025' (NULL order_date)

 

BigQuery creates these special partitions to:

 

·      Maintain data integrity (not lose data that doesn't fit your partitioning scheme)

·      Provide visibility into problematic or edge-case records

·      Allow you to query these records separately if needed

 


Previous                                                    Next                                                    Home

No comments:

Post a Comment