This is continuation to my previous post. I would recommend you to go through my previous post and understand segements.
Why to Delete a Segment?
Sometimes, you might want to:
· Delete old data (e.g., logs from 3 months ago)
· Fix corrupted or incorrect data
· Reingest data for a given time range
How to delete a segment permanently?
To delete a segment permanently, you must first mark it as unused, and then delete it from storage.
There are multiple ways to mark the segments as unused. To demonstrate the examples, I am using blog_visits datasource, which has 11 segments.
Option 1: Mark Specific Segment IDs as Unused
If you know the exact segment IDs, you can pass them in the payload:
Syntax
curl -X POST "http://<COORDINATOR_HOST>:<PORT>/druid/coordinator/v1/datasources/<DATASOURCE>/markUnused" \
  -H 'Content-Type: application/json' \
  -d '{
        "segmentIds": [
          "blog_visits_2025-04-18T01:00:00.000Z_2025-04-18T02:00:00.000Z_2024-07-01T10:00:00.000Z"
        ]
      }'
For example, following snippet mark two segments as unused.
$curl -X POST "http://localhost:8888/druid/coordinator/v1/datasources/blog_visits/markUnused" \
>   -H 'Content-Type: application/json' \
>   -d '{
>         "segmentIds": [
>           "blog_visits_2025-04-18T10:00:00.000Z_2025-04-18T11:00:00.000Z_2025-04-18T16:53:25.204Z",
>           "blog_visits_2025-04-18T09:00:00.000Z_2025-04-18T10:00:00.000Z_2025-04-18T16:53:25.204Z"
>         ]
>       }'
{"numChangedSegments":2,"segmentStateChanged":true}
After deletion of 2 segments, you can observe we left with 9 segments.
Option 2: Mark Segments as Unused by Time Interval
Use the markUnused endpoint with a time interval:
Syntax
curl -X POST "http://<COORDINATOR_HOST>:<PORT>/druid/coordinator/v1/datasources/<DATASOURCE>/markUnused" \
  -H 'Content-Type: application/json' \
  -d '{
        "interval": "START_INTERVAL/END_INTERVAL"
      }'
For example following statement mark the segments between intervals 2025-04-18T05:00:00.000Z and 2025-04-18T07:00:00.000Z as unused.
curl -X POST "http://localhost:8888/druid/coordinator/v1/datasources/blog_visits/markUnused" \
  -H 'Content-Type: application/json' \
  -d '{
        "interval": "2025-04-18T05:00:00.000Z/2025-04-18T07:00:00.000Z"
      }'
$curl -X POST "http://localhost:8888/druid/coordinator/v1/datasources/blog_visits/markUnused" \
>   -H 'Content-Type: application/json' \
>   -d '{
>         "interval": "2025-04-18T05:00:00.000Z/2025-04-18T07:00:00.000Z"
>       }'
{"numChangedSegments":2,"segmentStateChanged":true}
After marking another 2 segments as unused, we left with 7 segments.
Delete Segments Using a Kill Task
Now that the segments are marked as unused, they are ignored in queries but still live in deep storage. To permanently delete them, submit a kill task:
Syntax
curl -X POST http://<OVERLORD_HOST>:<PORT>/druid/indexer/v1/task \
  -H 'Content-Type: application/json' \
  -d '
    {
      "type": "kill",
      "dataSource": "DATA_SOURCE",
      "interval": "START_INTERVAL/END_INTERVAL"
    }
  '
Example
curl -X POST http://localhost:8888/druid/indexer/v1/task \
  -H 'Content-Type: application/json' \
  -d '
    {
      "type": "kill",
      "dataSource": "blog_visits",
      "interval": "2025-04-18T00:00:00.000Z/2025-04-18T11:00:00.000Z"
    }
  '
$curl -X POST http://localhost:8888/druid/indexer/v1/task \
>   -H 'Content-Type: application/json' \
>   -d '
>     {
>       "type": "kill",
>       "dataSource": "blog_visits",
>       "interval": "2025-04-18T00:00:00.000Z/2025-04-18T11:00:00.000Z"
>     }
>   '
{"task":"kill_blog_visits_hacplbld_2025-04-18T00:00:00.000Z_2025-04-18T11:00:00.000Z_2025-04-18T17:27:15.828Z"}



 
 
No comments:
Post a Comment