The Range Index in Apache Pinot divides numeric values into predefined ranges (buckets). These ranges are created during segment creation, and helps Pinot quickly filter queries like price > 500 or age BETWEEN 20 AND 30 kinds of queries efficiently.
Let’s take a sample dataset:
product_id |
name |
price |
timestamp |
1 |
Laptop |
1000 |
1700000000 |
2 |
Phone |
700 |
1700005000 |
3 |
Tablet |
500 |
1700010000 |
4 |
Keyboard |
50 |
1700020000 |
5 |
Monitor |
300 |
1700030000 |
Let's a Range Index on:
· price (INT)
· timestamp (LONG)
Range Index for price
Range |
Records in this range |
0-250 |
[] |
251-500 |
[3, 5] |
501-750 |
[2] |
751-1000 |
[1] |
Range Index for timestamp
Range |
Records in this range |
1699999000-1700005000 |
[1, 2] |
1700005001-1700015000 |
[3] |
1700015001-1700025000 |
[4] |
1700025001-1700035000 |
[5] |
For example, when you execute following query
SELECT * FROM products WHERE price > 500;
How Pinot Uses the Range Index?
· Pinot finds relevant ranges:
o It skips 0-250 and 251-500 ranges.
o It directly looks up 501-750 and 751-1000, and get the records {2, 1}
· It retrieves rows 2 and 1 (Phone and Laptop).
· No full table scan is needed in this case
In summary,
· Range Indexes are optimized for BETWEEN, >, < queries.
· Best for analytics on numeric/timestamp columns.
Previous Next Home
No comments:
Post a Comment