Tuesday, 9 September 2025

How to delete segments in Druid?

This is continuation to my previous post. I would recommend you to go through my previous post and understand segements. 

Why to Delete a Segment?

Sometimes, you might want to:

 

·      Delete old data (e.g., logs from 3 months ago)

·      Fix corrupted or incorrect data

·      Reingest data for a given time range

 

How to delete a segment permanently?

To delete a segment permanently, you must first mark it as unused, and then delete it from storage.

 

There are multiple ways to mark the segments as unused. To demonstrate the examples, I am using blog_visits datasource, which has 11 segments.

 


Option 1: Mark Specific Segment IDs as Unused

If you know the exact segment IDs, you can pass them in the payload:

 

Syntax

curl -X POST "http://<COORDINATOR_HOST>:<PORT>/druid/coordinator/v1/datasources/<DATASOURCE>/markUnused" \
  -H 'Content-Type: application/json' \
  -d '{
        "segmentIds": [
          "blog_visits_2025-04-18T01:00:00.000Z_2025-04-18T02:00:00.000Z_2024-07-01T10:00:00.000Z"
        ]
      }'

 

For example, following snippet mark two segments as unused.

$curl -X POST "http://localhost:8888/druid/coordinator/v1/datasources/blog_visits/markUnused" \
>   -H 'Content-Type: application/json' \
>   -d '{
>         "segmentIds": [
>           "blog_visits_2025-04-18T10:00:00.000Z_2025-04-18T11:00:00.000Z_2025-04-18T16:53:25.204Z",
>           "blog_visits_2025-04-18T09:00:00.000Z_2025-04-18T10:00:00.000Z_2025-04-18T16:53:25.204Z"
>         ]
>       }'
{"numChangedSegments":2,"segmentStateChanged":true}

 

After deletion of 2 segments, you can observe we left with 9 segments.

 


Option 2: Mark Segments as Unused by Time Interval

Use the markUnused endpoint with a time interval:

 

Syntax

 

curl -X POST "http://<COORDINATOR_HOST>:<PORT>/druid/coordinator/v1/datasources/<DATASOURCE>/markUnused" \
  -H 'Content-Type: application/json' \
  -d '{
        "interval": "START_INTERVAL/END_INTERVAL"
      }'

  For example following statement mark the segments between intervals 2025-04-18T05:00:00.000Z and 2025-04-18T07:00:00.000Z as unused.

 

curl -X POST "http://localhost:8888/druid/coordinator/v1/datasources/blog_visits/markUnused" \
  -H 'Content-Type: application/json' \
  -d '{
        "interval": "2025-04-18T05:00:00.000Z/2025-04-18T07:00:00.000Z"
      }'

 

$curl -X POST "http://localhost:8888/druid/coordinator/v1/datasources/blog_visits/markUnused" \
>   -H 'Content-Type: application/json' \
>   -d '{
>         "interval": "2025-04-18T05:00:00.000Z/2025-04-18T07:00:00.000Z"
>       }'
{"numChangedSegments":2,"segmentStateChanged":true}

After marking another 2 segments as unused, we left with 7 segments.

 


 

Delete Segments Using a Kill Task

Now that the segments are marked as unused, they are ignored in queries but still live in deep storage. To permanently delete them, submit a kill task:

 

Syntax

curl -X POST http://<OVERLORD_HOST>:<PORT>/druid/indexer/v1/task \
  -H 'Content-Type: application/json' \
  -d '
    {
      "type": "kill",
      "dataSource": "DATA_SOURCE",
      "interval": "START_INTERVAL/END_INTERVAL"
    }
  '

 

Example

curl -X POST http://localhost:8888/druid/indexer/v1/task \
  -H 'Content-Type: application/json' \
  -d '
    {
      "type": "kill",
      "dataSource": "blog_visits",
      "interval": "2025-04-18T00:00:00.000Z/2025-04-18T11:00:00.000Z"
    }
  '

$curl -X POST http://localhost:8888/druid/indexer/v1/task \
>   -H 'Content-Type: application/json' \
>   -d '
>     {
>       "type": "kill",
>       "dataSource": "blog_visits",
>       "interval": "2025-04-18T00:00:00.000Z/2025-04-18T11:00:00.000Z"
>     }
>   '
{"task":"kill_blog_visits_hacplbld_2025-04-18T00:00:00.000Z_2025-04-18T11:00:00.000Z_2025-04-18T17:27:15.828Z"}


Previous                                                    Next                                                    Home

No comments:

Post a Comment