Google BigQuery is a powerful data warehouse solution that enables fast SQL queries using the processing power of Google’s infrastructure. When working with BigQuery, it’s often useful (and sometimes essential) to automate or script table creation using the bq command-line tool, especially when managing infrastructure via CI/CD pipelines or shell scripts.
In this guide, we’ll explore how to create BigQuery tables using the bq mk command. We’ll cover how to define schemas both inline and through external JSON files, and we’ll touch on important table options like partitioning and clustering.
1. Basic Table Creation with Inline Schema
You can create a BigQuery table using the bq CLI with a simple inline schema definition. Below is an example:
bq mk \ --table \ --expiration 3000 \ --description "This is a sample user data table" \ --label env:dev \ --label owner:analytics_team \ --require_partition_filter=true \ --time_partitioning_type=DAY \ --time_partitioning_expiration=6000 \ --clustering_fields=name \ --schema name:STRING,gender:STRING,signup_date:TIMESTAMP \ my_project.my_dataset.user_table
Explanation of Flags:
· --table: Indicates you're creating a table.
· --expiration: Table TTL in seconds.
· --description: Adds a description to the table.
· --label: Useful for tagging resources (key:value format).
· --require_partition_filter: Forces queries to filter by partition (best for cost control).
· --time_partitioning_type=DAY: Creates a table partitioned by ingestion time by day.
· --time_partitioning_expiration: TTL for each partition.
· --clustering_fields: Fields to use for clustering.
· --schema: Inline definition of the schema as field_name:TYPE.
You can omit the project in ‘my_project.my_dataset.user_table’ if you're working within your default project context.
For example, following statement creates user_table in my_org dataset.
bq mk \ --table \ --expiration 3000 \ --description "This is a sample user data table" \ --label env:dev \ --label owner:analytics_team \ --require_partition_filter=true \ --time_partitioning_type=DAY \ --time_partitioning_expiration=6000 \ --clustering_fields=name \ --schema name:STRING,gender:STRING,signup_date:TIMESTAMP \ my_org.user_table
$bq mk \ > --table \ > --expiration 3000 \ > --description "This is a sample user data table" \ > --label env:dev \ > --label owner:analytics_team \ > --require_partition_filter=true \ > --time_partitioning_type=DAY \ > --time_partitioning_expiration=6000 \ > --clustering_fields=name \ > --schema name:STRING,gender:STRING,signup_date:TIMESTAMP \ > my_org.user_table Table 'demo-project:my_org.user_table' successfully created.
How to view the schema of user_table?
Once the table is created, you can verify the schema using:
bq show --schema my_project.my_dataset.my_table
Or simply:
bq show --schema my_dataset.my_table
bq --format=prettyjson show --schema my_org.user_table
$bq --format=prettyjson show --schema my_org.user_table [ { "name": "name", "type": "STRING" }, { "name": "gender", "type": "STRING" }, { "name": "signup_date", "type": "TIMESTAMP" } ]
2. Table Creation Using Schema Definition File (JSON)
For better readability and maintainability, especially with large schemas, you can define your table schema in a JSON file.
Sample schema.json
[ { "name": "name", "type": "STRING", "mode": "REQUIRED", "description": "The name of the user" }, { "name": "gender", "type": "STRING", "mode": "NULLABLE", "description": "Gender of the user" }, { "name": "signup_date", "type": "TIMESTAMP", "mode": "NULLABLE", "description": "The date the user signed up" } ]
Command to create table:
bq mk \ --table \ --expiration 3000 \ --description "User data table with schema from file" \ --label env:prod \ --require_partition_filter=true \ --time_partitioning_type=DAY \ --time_partitioning_expiration=6000 \ --clustering_fields=name \ --schema ./schema.json \ my_dataset.user_table_json
For example, following snippet create ‘user_table_json’ table in my_org dataset using schema.json file
bq mk \ --table \ --expiration 3000 \ --description "User data table with schema from file" \ --label env:prod \ --require_partition_filter=true \ --time_partitioning_type=DAY \ --time_partitioning_expiration=6000 \ --clustering_fields=name \ --schema ./schema.json \ my_org.user_table_json
$bq mk \ > --table \ > --expiration 3000 \ > --description "User data table with schema from file" \ > --label env:prod \ > --require_partition_filter=true \ > --time_partitioning_type=DAY \ > --time_partitioning_expiration=6000 \ > --clustering_fields=name \ > --schema ./schema.json \ > my_org.user_table_json Table 'demo-project:my_org.user_table_json' successfully created.
Execute following command to print the user_table_json schema file.
bq --format=prettyjson show --schema my_org.user_table_json
$bq --format=prettyjson show --schema my_org.user_table_json [ { "description": "The name of the user", "mode": "REQUIRED", "name": "name", "type": "STRING" }, { "description": "Gender of the user", "mode": "NULLABLE", "name": "gender", "type": "STRING" }, { "description": "The date the user signed up", "mode": "NULLABLE", "name": "signup_date", "type": "TIMESTAMP" } ]
Previous Next Home
No comments:
Post a Comment