When working with large datasets in BigQuery, performance and cost optimization are essential. One effective way to improve performance especially for complex queries is by using materialized views.
In this post, I'll explain what materialized views are, how they differ from standard views, when to use them, and some key best practices and limitations.
1. What Are Materialized Views?
In BigQuery, a materialized view is a precomputed view that stores the results of a query on disk. This means the query doesn’t need to be executed every time you access the view, making the queries faster and more cost-effective.
By contrast, a standard view is simply a saved SQL query. When you query a standard view, BigQuery re-executes the underlying query every time, which can be slower and more expensive, especially if the query is complex.
2. How Do Materialized Views Work?
· When you create a materialized view, BigQuery runs the query and stores the result.
· Unlike standard views, materialized views do not automatically reflect changes made to the base table.
· The data remains the same until the view is refreshed, either manually or automatically (if auto-refresh is enabled).
· This makes materialized views ideal for scenarios where:
o The data doesn't change frequently.
o You want to speed up expensive, repeated queries.
3. Benefits Over Standard Views
· Performance: Materialized views respond much faster because the results are precomputed.
· Cost Savings: Since the query doesn't re-run every time, you're charged only for querying the stored result.
4. Key Points to Remember When Working with BigQuery
· Materialized view and base table must be in the same dataset : BigQuery requires that the materialized view and its base table live in the same dataset. This setup helps BigQuery quickly update the materialized view when the base table changes.
· Materialized views can only use one base table (no joins allowed): A materialized view in BigQuery can only be built from a single table. You can't use joins with other tables. These views are best for speeding up repeated queries on one table.
· Materialized views are mainly used for aggregation or filtering.: BigQuery materialized views work best when they include aggregate functions like SUM, COUNT, or MAX, or when they apply filters. These types of queries can be pre-computed to save time later.
· Partitioning must match the base table’s partitioning: If the base table is partitioned, the materialized view must use the same partition column. This helps keep queries fast and updates efficient.
· Clustering can be done on any column from the base table: You can choose any columns for clustering in a materialized view. They don’t need to match the clustering columns of the base table. Use clustering to speed up queries that filter or group by those columns.
5. Limitations of BigQuery Materialized Views
· Materialized views cannot be used in copy, import, or export jobs. You can't directly copy a materialized view or use it as a source or destination in import/export operations. These jobs only work with regular tables, not materialized views.
· If the base table is deleted, the materialized view breaks.
o The materialized view becomes invalid.
o Any queries or refresh operations on the view will fail.
o Even if you recreate the base table with the same name, the materialized view won’t automatically start working again. you'll need to recreate it too.
Follow below step-by-step procedure to experiment with materialized views.
Step 1: Create the Sales Table
CREATE TABLE `my_project.my_dataset.daily_sales` ( sale_date DATE, salesperson_name STRING, region STRING, amount NUMERIC );
Step 2: Insert Sample Data (with Indian names)
INSERT INTO `my_project.my_dataset.daily_sales` (sale_date, salesperson_name, region, amount) VALUES ('2024-04-01', 'Amit', 'North', 12000), ('2024-04-01', 'Priya', 'South', 18000), ('2024-04-02', 'Rajesh', 'North', 15000), ('2024-04-02', 'Anita', 'West', 20000), ('2024-04-03', 'Amit', 'North', 10000), ('2024-04-03', 'Priya', 'South', 22000), ('2024-04-04', 'Rajesh', 'North', 14000), ('2024-04-04', 'Anita', 'West', 21000);
Step 3: Create Materialized Views
a. Total Sales Per Salesperson
This view gives the total sales done by each salesperson.
CREATE MATERIALIZED VIEW `my_project.my_dataset.mv_total_sales_per_person` AS SELECT salesperson_name, SUM(amount) AS total_sales FROM `my_project.my_dataset.daily_sales` GROUP BY salesperson_name;
This is a classic aggregation use case. The view precomputes total sales per person. When you run a query to fetch totals, BigQuery serves from the precomputed results instead of scanning the whole table.
Let’s query all the records of ‘my_project.my_dataset.mv_total_sales_per_person` view by executing following sql statement.
SELECT * FROM my_project.my_dataset.mv_total_sales_per_person
b. Daily Sales Total
Get total sales for each day.
CREATE MATERIALIZED VIEW `my_project.my_dataset.mv_daily_sales_total` AS SELECT sale_date, SUM(amount) AS total_amount FROM `my_project.my_dataset.daily_sales` GROUP BY sale_date;
If you want to generate a daily report or chart, this materialized view speeds things up by aggregating data per day.
Let’s query all the records of ‘my_project.my_dataset.mv_daily_sales_total` view by executing following sql statement.
SELECT * FROM my_project.my_dataset.mv_daily_sales_total
c. Total Sales by Region
Summarize sales region-wise.
CREATE MATERIALIZED VIEW `my_project.my_dataset.mv_region_sales` AS SELECT region, SUM(amount) AS region_sales FROM `my_project.my_dataset.daily_sales` GROUP BY region;
This helps businesses to see which region is performing best. Again, it's aggregated and fast because of precomputation.
Let’s query all the records of ‘my_project.my_dataset.mv_region_sales’ view by executing following sql statement.
SELECT * FROM my_project.my_dataset.mv_region_sales
d. Top Salesperson Daily
This one is trickier because materialized views don't support non-aggregation queries or subqueries. So instead, we'll keep it simple:
CREATE MATERIALIZED VIEW `my_project.my_dataset.mv_daily_salesperson_summary` AS SELECT sale_date, salesperson_name, SUM(amount) AS daily_sales FROM `my_project.my_dataset.daily_sales` GROUP BY sale_date, salesperson_name;
You can use this view to later query for the top salesperson per day using standard SQL on top of this materialized view. The view itself aggregates daily per person.
Let’s query all the records of ‘my_project.my_dataset.mv_daily_salesperson_summary’ view by executing following sql statement.
SELECT * FROM my_project.my_dataset.mv_daily_salesperson_summary
All the four materialized views we created
· uses a single table with aggregation only.
· No joins or subqueries used, materialized views don’t support them.
In summary, Materialized views are a powerful tool in BigQuery to boost performance and reduce costs. While they come with certain restrictions, their benefits make them well-suited for repeated, compute-heavy queries. Just remember to monitor freshness and be cautious with base table deletions.
Note:
If your query (or a subquery) can be satisfied by an existing materialized view, BigQuery automatically reroutes it to use the materialized view for better efficiency.
Previous Next Home
No comments:
Post a Comment