This is continuation to my previous post. I would recommend you to go through my previous post and understand segements.
Why to Delete a Segment?
Sometimes, you might want to:
· Delete old data (e.g., logs from 3 months ago)
· Fix corrupted or incorrect data
· Reingest data for a given time range
How to delete a segment permanently?
To delete a segment permanently, you must first mark it as unused, and then delete it from storage.
There are multiple ways to mark the segments as unused. To demonstrate the examples, I am using blog_visits datasource, which has 11 segments.
Option 1: Mark Specific Segment IDs as Unused
If you know the exact segment IDs, you can pass them in the payload:
Syntax
curl -X POST "http://<COORDINATOR_HOST>:<PORT>/druid/coordinator/v1/datasources/<DATASOURCE>/markUnused" \
-H 'Content-Type: application/json' \
-d '{
"segmentIds": [
"blog_visits_2025-04-18T01:00:00.000Z_2025-04-18T02:00:00.000Z_2024-07-01T10:00:00.000Z"
]
}'
For example, following snippet mark two segments as unused.
$curl -X POST "http://localhost:8888/druid/coordinator/v1/datasources/blog_visits/markUnused" \
> -H 'Content-Type: application/json' \
> -d '{
> "segmentIds": [
> "blog_visits_2025-04-18T10:00:00.000Z_2025-04-18T11:00:00.000Z_2025-04-18T16:53:25.204Z",
> "blog_visits_2025-04-18T09:00:00.000Z_2025-04-18T10:00:00.000Z_2025-04-18T16:53:25.204Z"
> ]
> }'
{"numChangedSegments":2,"segmentStateChanged":true}
After deletion of 2 segments, you can observe we left with 9 segments.
Option 2: Mark Segments as Unused by Time Interval
Use the markUnused endpoint with a time interval:
Syntax
curl -X POST "http://<COORDINATOR_HOST>:<PORT>/druid/coordinator/v1/datasources/<DATASOURCE>/markUnused" \
-H 'Content-Type: application/json' \
-d '{
"interval": "START_INTERVAL/END_INTERVAL"
}'
For example following statement mark the segments between intervals 2025-04-18T05:00:00.000Z and 2025-04-18T07:00:00.000Z as unused.
curl -X POST "http://localhost:8888/druid/coordinator/v1/datasources/blog_visits/markUnused" \
-H 'Content-Type: application/json' \
-d '{
"interval": "2025-04-18T05:00:00.000Z/2025-04-18T07:00:00.000Z"
}'
$curl -X POST "http://localhost:8888/druid/coordinator/v1/datasources/blog_visits/markUnused" \
> -H 'Content-Type: application/json' \
> -d '{
> "interval": "2025-04-18T05:00:00.000Z/2025-04-18T07:00:00.000Z"
> }'
{"numChangedSegments":2,"segmentStateChanged":true}
After marking another 2 segments as unused, we left with 7 segments.
Delete Segments Using a Kill Task
Now that the segments are marked as unused, they are ignored in queries but still live in deep storage. To permanently delete them, submit a kill task:
Syntax
curl -X POST http://<OVERLORD_HOST>:<PORT>/druid/indexer/v1/task \
-H 'Content-Type: application/json' \
-d '
{
"type": "kill",
"dataSource": "DATA_SOURCE",
"interval": "START_INTERVAL/END_INTERVAL"
}
'
Example
curl -X POST http://localhost:8888/druid/indexer/v1/task \
-H 'Content-Type: application/json' \
-d '
{
"type": "kill",
"dataSource": "blog_visits",
"interval": "2025-04-18T00:00:00.000Z/2025-04-18T11:00:00.000Z"
}
'
$curl -X POST http://localhost:8888/druid/indexer/v1/task \
> -H 'Content-Type: application/json' \
> -d '
> {
> "type": "kill",
> "dataSource": "blog_visits",
> "interval": "2025-04-18T00:00:00.000Z/2025-04-18T11:00:00.000Z"
> }
> '
{"task":"kill_blog_visits_hacplbld_2025-04-18T00:00:00.000Z_2025-04-18T11:00:00.000Z_2025-04-18T17:27:15.828Z"}



No comments:
Post a Comment