Friday, 1 August 2025

Range Index in Apache Pinot

The Range Index in Apache Pinot divides numeric values into predefined ranges (buckets). These ranges are created during segment creation, and helps Pinot quickly filter queries like price > 500 or age BETWEEN 20 AND 30 kinds of queries efficiently.

 

Let’s take a sample dataset:

product_id

name

price

timestamp

1

Laptop

1000

1700000000

2

Phone

700

1700005000

3

Tablet

500

1700010000

4

Keyboard

50

1700020000

5

Monitor

300

1700030000

 

Let's a Range Index on:

·      price (INT)

·      timestamp (LONG)

 

Range Index for price

Range

Records in this range

0-250

[]

251-500

[3, 5]

501-750

[2]

751-1000

[1]

 

Range Index for timestamp

Range

Records in this range

1699999000-1700005000

[1, 2]

1700005001-1700015000

[3]

1700015001-1700025000

[4]

1700025001-1700035000

[5]

 

For example, when you execute following query

SELECT * FROM products WHERE price > 500;

 

How Pinot Uses the Range Index?

·      Pinot finds relevant ranges:

o   It skips 0-250 and 251-500 ranges.

o   It directly looks up 501-750 and 751-1000, and get the records {2, 1}

·      It retrieves rows 2 and 1 (Phone and Laptop).

·      No full table scan is needed in this case

 

In summary,

·      Range Indexes are optimized for BETWEEN, >, < queries.

·      Best for analytics on numeric/timestamp columns.

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment