Google BigQuery is a powerful, serverless data warehouse that makes analyzing large datasets simple and fast. While the UI is great, many engineers prefer using the command-line tool bq for automation, repeatability, and scripting.
In this post, I’ll walk through how to load data into a BigQuery table using the bq load command. You’ll learn how to define the schema inline, use a schema file, or let BigQuery auto-detect the schema. This guide is especially useful if you're working on data pipelines, CI/CD jobs, or one-time loads from structured files.
Basic Syntax
bq load [PROJECT_ID]:[DATASET].[TABLE] [PATH_TO_SOURCE_FILE] [SCHEMA]
If the table does not exist, BigQuery will create it automatically using the schema you provide (or auto-detect, if specified). If you omit [PROJECT_ID], then it create the table in default project that you already set.
I am using following csv file to demonstrate the examples.
data.csv
Aarav Sharma,29,2020-06-15 Isha Patel,32,2019-11-03 Rohan Mehta,25,2021-04-22 Diya Iyer,28,2018-08-10 Kabir Rao,35,2017-12-01
Example 1: Load CSV with Inline Schema Definition
bq load --source_format=CSV \ my-project:my_dataset.my_table \ ./data.csv \ name:STRING,age:INTEGER,joined_date:DATE
Here,
· The table will be created if it doesn’t exist.
· Schema is specified inline using name:type format.
· The file data.csv must exist locally or in Cloud Storage.
For example, following command create ‘emps_inline_schema’ table in my_org dataset.
bq load --source_format=CSV \ my_org.emps_inline_schema \ ./data.csv \ name:STRING,age:INTEGER,joined_date:DATE
$bq load --source_format=CSV \ > my_org.emps_inline_schema \ > ./data.csv \ > name:STRING,age:INTEGER,joined_date:DATE Upload complete. Waiting on bqjob_x74123438b30cf2_000001960dd4bab3_1 ... (3s) Current status: DONE
Use ‘bq query’ command to print the records.
$bq query --use_legacy_sql=false "SELECT * FROM my_org.emps_inline_schema" +--------------+-----+-------------+ | name | age | joined_date | +--------------+-----+-------------+ | Aarav Sharma | 29 | 2020-06-15 | | Isha Patel | 32 | 2019-11-03 | | Rohan Mehta | 25 | 2021-04-22 | | Diya Iyer | 28 | 2018-08-10 | | Kabir Rao | 35 | 2017-12-01 | +--------------+-----+-------------+
Example 2: Load CSV with Schema File (JSON)
Create a schema.json file:
schema.json
[ {"name": "name", "type": "STRING"}, {"name": "age", "type": "INTEGER"}, {"name": "joined_date", "type": "DATE"} ]
Syntax to use schema file
bq load --source_format=CSV \ my-project:my_dataset.my_table \ ./data.csv \ ./schema.json
For example, following command create emps_schema_file in my_org.
bq load --source_format=CSV \ my_org.emps_schema_file \ ./data.csv \ ./schema.json
$bq load --source_format=CSV \ > my_org.emps_schema_file \ > ./data.csv \ > ./schema.json Upload complete. Waiting on bqjob_y6abcdefgj14d8_000001960dd74965_1 ... (3s) Current status: DONE
Use ‘bq query’ command to print the records.
bq query --use_legacy_sql=false "SELECT * FROM my_org.emps_schema_file"
$bq query --use_legacy_sql=false "SELECT * FROM my_org.emps_schema_file" +--------------+-----+-------------+ | name | age | joined_date | +--------------+-----+-------------+ | Aarav Sharma | 29 | 2020-06-15 | | Isha Patel | 32 | 2019-11-03 | | Rohan Mehta | 25 | 2021-04-22 | | Diya Iyer | 28 | 2018-08-10 | | Kabir Rao | 35 | 2017-12-01 | +--------------+-----+-------------+
Example 3: Load CSV with Auto Schema Detection
bq load --source_format=CSV --autodetect \ my-project:my_dataset.my_table \ ./data.csv
For example, following command creates ‘emps_auto_detect’ table in my_org dataset.
bq load --source_format=CSV --autodetect \ my_org.emps_auto_detect \ ./data.csv
$bq load --source_format=CSV --autodetect \ > my_org.emps_auto_detect \ > ./data.csv Upload complete. Waiting on bqjob_r6f59198129738268_000001960asgadgsa_1 ... (3s) Current status: DONE
Let’s query emps_auto_detect table.
$bq query --use_legacy_sql=false "SELECT * FROM my_org.emps_auto_detect" +----------------+---------------+--------------+ | string_field_0 | int64_field_1 | date_field_2 | +----------------+---------------+--------------+ | Aarav Sharma | 29 | 2020-06-15 | | Isha Patel | 32 | 2019-11-03 | | Rohan Mehta | 25 | 2021-04-22 | | Diya Iyer | 28 | 2018-08-10 | | Kabir Rao | 35 | 2017-12-01 | +----------------+---------------+--------------+
As you see the output, the column names are not meaningful here. Let’s add column names in csv file and create other table to reflect the column headers.
dataWithHeaders.csv
name,age,joining_date Aarav Sharma,29,2020-06-15 Isha Patel,32,2019-11-03 Rohan Mehta,25,2021-04-22 Diya Iyer,28,2018-08-10 Kabir Rao,35,2017-12-01
Execute following command to create emps_auto_detect_with_headers table.
bq load --source_format=CSV --autodetect \ my_org.emps_auto_detect_with_headers \ ./dataWithHeaders.csv
Let's query the content of emps_auto_detect_with_headers table.
$bq query --use_legacy_sql=false "SELECT * FROM my_org.emps_auto_detect_with_headers" +--------------+-----+--------------+ | name | age | joining_date | +--------------+-----+--------------+ | Aarav Sharma | 29 | 2020-06-15 | | Isha Patel | 32 | 2019-11-03 | | Rohan Mehta | 25 | 2021-04-22 | | Diya Iyer | 28 | 2018-08-10 | | Kabir Rao | 35 | 2017-12-01 | +--------------+-----+--------------+
In summary, The bq load command is a powerful way to ingest data into BigQuery from the command line. Whether you're defining schemas inline, through a JSON file, or using auto-detection, BigQuery provides flexibility for your workflows. Now you're ready to automate your data loading tasks confidently!
Previous Next Home
No comments:
Post a Comment