JSON data is everywhere, and it often comes with nested structures. Apache Druid makes it easy to flatten and ingest nested fields using the jsonPath specification.
In this blog post, I’ll walk through how to parse a nested JSON file, specifically one with student details including nested address information.
We’ll extract fields like student name, age, city, and state from a nested object and load them into Druid for easy querying.
studentDetails.json
{"id": 1, "timestamp": "2025-04-01T10:00:00Z", "name": "Aarav", "age": 15, "address": {"city": "Mumbai", "state": "Maharashtra"}} {"id": 2, "timestamp": "2025-04-01T10:05:00Z", "name": "Diya", "age": 14, "address": {"city": "Delhi", "state": "Delhi"}} {"id": 3, "timestamp": "2025-04-01T10:10:00Z", "name": "Rahul", "age": 16, "address": {"city": "Chennai", "state": "Tamil Nadu"}} {"id": 4, "timestamp": "2025-04-01T10:15:00Z", "name": "Ishita", "age": 15, "address": {"city": "Bengaluru", "state": "Karnataka"}} {"id": 5, "timestamp": "2025-04-01T10:20:00Z", "name": "Kunal", "age": 17, "address": {"city": "Hyderabad", "state": "Telangana"}}
Druid Ingestion Spec (For Parsing Nested Address):
student_data_nested_parsing.json
{ "type": "index", "spec": { "dataSchema": { "dataSource": "student_data_nested_parsing", "timestampSpec": { "column": "timestamp", "format": "iso" }, "dimensionsSpec": { "dimensions": [ "name", "age", "address_city", "address_state" ] }, "transformSpec": { "transforms": [] }, "metricsSpec": [], "granularitySpec": { "type": "uniform", "segmentGranularity": "day", "queryGranularity": "none", "rollup": false } }, "ioConfig": { "type": "index", "inputSource": { "type": "local", "baseDir": "/Users/Shared/druid_samples", "filter": "studentDetails.json" }, "inputFormat": { "type": "json", "flattenSpec": { "useFieldDiscovery": false, "fields": [ { "type": "path", "name": "name", "expr": "$.name" }, { "type": "path", "name": "age", "expr": "$.age" }, { "type": "path", "name": "address_city", "expr": "$.address.city" }, { "type": "path", "name": "address_state", "expr": "$.address.state" } ] } } }, "tuningConfig": { "type": "index" } } }
Execute following curl command to submit student_data_nested_parsing.json spec.
curl -X POST http://localhost:8081/druid/indexer/v1/task \ -H 'Content-Type: application/json' \ -d @student_data_nested_parsing.json
Upon successful execution of the tast, navigate to Query tab and execute following SQL query.
SELECT * FROM "student_data_nested_parsing"
How flattening works here?
Json Sample
{ "id": 1, "timestamp": "2025-04-01T10:00:00Z", "name": "Aarav", "age": 15, "address": { "city": "Mumbai", "state": "Maharashtra" } }
FlattenSpec
"flattenSpec": { "useFieldDiscovery": false, "fields": [ { "type": "path", "name": "name", "expr": "$.name" }, { "type": "path", "name": "age", "expr": "$.age" }, { "type": "path", "name": "address_city", "expr": "$.address.city" }, { "type": "path", "name": "address_state", "expr": "$.address.state" } ] }
· type: path tells Druid that you're using JSONPath expressions to locate the value inside the nested JSON.
· expr specifies the JSONPath to the desired value.
· name is the name Druid will use to refer to this flattened field in the index.
So for the nested address object:
$.address.city becomes address_city
$.address.state becomes address_state
Final Flattened Structure Sent to Druid Ingestion:
For the above JSON row, Druid ingests it as if it were:
{ "timestamp": "2025-04-01T10:00:00Z", "name": "Aarav", "age": 15, "address_city": "Mumbai", "address_state": "Maharashtra" }
Note
useFieldDiscovery: false
This ensures Druid only flattens fields explicitly listed in the fields array.
Previous Next Home
No comments:
Post a Comment