Monday, 1 September 2025

Parsing Nested JSON in Apache Druid: A Beginner’s Guide with Student Data Example

JSON data is everywhere, and it often comes with nested structures. Apache Druid makes it easy to flatten and ingest nested fields using the jsonPath specification. 

In this blog post, I’ll walk through how to parse a nested JSON file, specifically one with student details including nested address information.

 

We’ll extract fields like student name, age, city, and state from a nested object and load them into Druid for easy querying.

 

studentDetails.json 

{"id": 1, "timestamp": "2025-04-01T10:00:00Z", "name": "Aarav", "age": 15, "address": {"city": "Mumbai", "state": "Maharashtra"}}
{"id": 2, "timestamp": "2025-04-01T10:05:00Z", "name": "Diya", "age": 14, "address": {"city": "Delhi", "state": "Delhi"}}
{"id": 3, "timestamp": "2025-04-01T10:10:00Z", "name": "Rahul", "age": 16, "address": {"city": "Chennai", "state": "Tamil Nadu"}}
{"id": 4, "timestamp": "2025-04-01T10:15:00Z", "name": "Ishita", "age": 15, "address": {"city": "Bengaluru", "state": "Karnataka"}}
{"id": 5, "timestamp": "2025-04-01T10:20:00Z", "name": "Kunal", "age": 17, "address": {"city": "Hyderabad", "state": "Telangana"}}

 

Druid Ingestion Spec (For Parsing Nested Address):

 

student_data_nested_parsing.json

 

{
  "type": "index",
  "spec": {
    "dataSchema": {
      "dataSource": "student_data_nested_parsing",
      "timestampSpec": {
        "column": "timestamp",
        "format": "iso"
      },
      "dimensionsSpec": {
        "dimensions": [
          "name",
          "age",
          "address_city",
          "address_state"
        ]
      },
      "transformSpec": {
        "transforms": []
      },
      "metricsSpec": [],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "day",
        "queryGranularity": "none",
        "rollup": false
      }
    },
    "ioConfig": {
      "type": "index",
      "inputSource": {
        "type": "local",
        "baseDir": "/Users/Shared/druid_samples",
        "filter": "studentDetails.json"
      },
      "inputFormat": {
        "type": "json",
        "flattenSpec": {
          "useFieldDiscovery": false,
          "fields": [
            { "type": "path", "name": "name", "expr": "$.name" },
            { "type": "path", "name": "age", "expr": "$.age" },
            { "type": "path", "name": "address_city", "expr": "$.address.city" },
            { "type": "path", "name": "address_state", "expr": "$.address.state" }
          ]
        }
      }
    },
    "tuningConfig": {
      "type": "index"
    }
  }
}

Execute following curl command to submit student_data_nested_parsing.json spec.

curl -X POST http://localhost:8081/druid/indexer/v1/task \
     -H 'Content-Type: application/json' \
     -d @student_data_nested_parsing.json

Upon successful execution of the tast, navigate to Query tab and execute following SQL query.  

 

SELECT * FROM "student_data_nested_parsing"

  


 

How flattening works here?

Json Sample

 

{
  "id": 1,
  "timestamp": "2025-04-01T10:00:00Z",
  "name": "Aarav",
  "age": 15,
  "address": {
    "city": "Mumbai",
    "state": "Maharashtra"
  }
}

 

FlattenSpec

"flattenSpec": {
  "useFieldDiscovery": false,
  "fields": [
    { "type": "path", "name": "name", "expr": "$.name" },
    { "type": "path", "name": "age", "expr": "$.age" },
    { "type": "path", "name": "address_city", "expr": "$.address.city" },
    { "type": "path", "name": "address_state", "expr": "$.address.state" }
  ]
}

·      type: path tells Druid that you're using JSONPath expressions to locate the value inside the nested JSON.

·      expr specifies the JSONPath to the desired value.

·      name is the name Druid will use to refer to this flattened field in the index.

 

So for the nested address object:

 

$.address.city becomes address_city

$.address.state becomes address_state

 

Final Flattened Structure Sent to Druid Ingestion:

For the above JSON row, Druid ingests it as if it were:

{
  "timestamp": "2025-04-01T10:00:00Z",
  "name": "Aarav",
  "age": 15,
  "address_city": "Mumbai",
  "address_state": "Maharashtra"
}

Note

useFieldDiscovery: false

This ensures Druid only flattens fields explicitly listed in the fields array.

  

Previous                                                    Next                                                    Home

No comments:

Post a Comment