This is continuation to my previous post. I would recommend you to go through my previous post and understand segements.
Why to Delete a Segment?
Sometimes, you might want to:
· Delete old data (e.g., logs from 3 months ago)
· Fix corrupted or incorrect data
· Reingest data for a given time range
How to delete a segment permanently?
To delete a segment permanently, you must first mark it as unused, and then delete it from storage.
There are multiple ways to mark the segments as unused. To demonstrate the examples, I am using blog_visits datasource, which has 11 segments.
Option 1: Mark Specific Segment IDs as Unused
If you know the exact segment IDs, you can pass them in the payload:
Syntax
curl -X POST "http://<COORDINATOR_HOST>:<PORT>/druid/coordinator/v1/datasources/<DATASOURCE>/markUnused" \
-H 'Content-Type: application/json' \
-d '{
"segmentIds": [
"blog_visits_2025-04-18T01:00:00.000Z_2025-04-18T02:00:00.000Z_2024-07-01T10:00:00.000Z"
]
}'
For example, following snippet mark two segments as unused.
$curl -X POST "http://localhost:8888/druid/coordinator/v1/datasources/blog_visits/markUnused" \
> -H 'Content-Type: application/json' \
> -d '{
> "segmentIds": [
> "blog_visits_2025-04-18T10:00:00.000Z_2025-04-18T11:00:00.000Z_2025-04-18T16:53:25.204Z",
> "blog_visits_2025-04-18T09:00:00.000Z_2025-04-18T10:00:00.000Z_2025-04-18T16:53:25.204Z"
> ]
> }'
{"numChangedSegments":2,"segmentStateChanged":true}
After deletion of 2 segments, you can observe we left with 9 segments.
Option 2: Mark Segments as Unused by Time Interval
Use the markUnused endpoint with a time interval:
Syntax
curl -X POST "http://<COORDINATOR_HOST>:<PORT>/druid/coordinator/v1/datasources/<DATASOURCE>/markUnused" \ -H 'Content-Type: application/json' \ -d '{ "interval": "START_INTERVAL/END_INTERVAL" }'
For example following statement mark the segments between intervals 2025-04-18T05:00:00.000Z and 2025-04-18T07:00:00.000Z as unused.
curl -X POST "http://localhost:8888/druid/coordinator/v1/datasources/blog_visits/markUnused" \ -H 'Content-Type: application/json' \ -d '{ "interval": "2025-04-18T05:00:00.000Z/2025-04-18T07:00:00.000Z" }'
$curl -X POST "http://localhost:8888/druid/coordinator/v1/datasources/blog_visits/markUnused" \ > -H 'Content-Type: application/json' \ > -d '{ > "interval": "2025-04-18T05:00:00.000Z/2025-04-18T07:00:00.000Z" > }' {"numChangedSegments":2,"segmentStateChanged":true}
After marking another 2 segments as unused, we left with 7 segments.
Delete Segments Using a Kill Task
Now that the segments are marked as unused, they are ignored in queries but still live in deep storage. To permanently delete them, submit a kill task:
Syntax
curl -X POST http://<OVERLORD_HOST>:<PORT>/druid/indexer/v1/task \ -H 'Content-Type: application/json' \ -d ' { "type": "kill", "dataSource": "DATA_SOURCE", "interval": "START_INTERVAL/END_INTERVAL" } '
Example
curl -X POST http://localhost:8888/druid/indexer/v1/task \ -H 'Content-Type: application/json' \ -d ' { "type": "kill", "dataSource": "blog_visits", "interval": "2025-04-18T00:00:00.000Z/2025-04-18T11:00:00.000Z" } '
$curl -X POST http://localhost:8888/druid/indexer/v1/task \ > -H 'Content-Type: application/json' \ > -d ' > { > "type": "kill", > "dataSource": "blog_visits", > "interval": "2025-04-18T00:00:00.000Z/2025-04-18T11:00:00.000Z" > } > ' {"task":"kill_blog_visits_hacplbld_2025-04-18T00:00:00.000Z_2025-04-18T11:00:00.000Z_2025-04-18T17:27:15.828Z"}
No comments:
Post a Comment