When ingesting data into Apache Druid, sometimes you don't want to load everything, you may want to filter out unwanted rows. For example, you might want to ingest only rows where the product is "Mobile", or the city is "Delhi".
In this blog post, we’ll learn how to use Druid’s transformSpec.filter to include only the rows we care about. We’ll demonstrate this using a simple CSV file with product sales data.
sales_data.csv
timestamp,product,city,sales 2025-04-01T10:00:00Z,Laptop,Delhi,300 2025-04-01T10:00:00Z,Laptop,Delhi,200 2025-04-01T11:00:00Z,Tablet,Mumbai,150 2025-04-01T11:00:00Z,Tablet,Mumbai,50 2025-04-01T12:00:00Z,Mobile,Bengaluru,200 2025-04-01T13:00:00Z,Laptop,Hyderabad,250 2025-04-01T14:00:00Z,Tablet,Chennai,180 2025-04-01T15:00:00Z,Mobile,Pune,220 2025-04-01T15:00:00Z,Mobile,Pune,80
I want to ingest only rows where product = Mobile using a filter.
Druid Ingestion Spec with a Filter is given below.
sales_data_filters.json
{ "type": "index", "spec": { "dataSchema": { "dataSource": "sales_data_filter_demo", "timestampSpec": { "column": "timestamp", "format": "iso" }, "dimensionsSpec": { "dimensions": ["product", "city"] }, "transformSpec": { "filter": { "type": "selector", "dimension": "product", "value": "Mobile" } }, "metricsSpec": [ { "name": "total_sales", "type": "doubleSum", "fieldName": "sales" } ], "granularitySpec": { "type": "uniform", "segmentGranularity": "day", "queryGranularity": "hour", "rollup": false } }, "ioConfig": { "type": "index", "inputSource": { "type": "local", "baseDir": "/Users/Shared/druid_samples", "filter": "sales_data.csv" }, "inputFormat": { "type": "csv", "columns": ["timestamp", "product", "city", "sales"] } }, "tuningConfig": { "type": "index" } } }
Execute below command to submit the task from sales_data_filters.json spec file.
curl -X POST http://localhost:8081/druid/indexer/v1/task \ -H 'Content-Type: application/json' \ -d @sales_data_filters.json
Navigate to Druid workbench -> Query tab and execute following sql query.
SELECT * FROM "sales_data_filter_demo"
You can see only Mobile specific data.
Previous Next Home
No comments:
Post a Comment