Apache Pinot is a real-time distributed OLAP datastore designed for low-latency analytics. Indexing in Pinot helps to optimize query performance. In this guide, we will go through the step-by-step process to:
· How to create indexes?
· Define a schema
· Define a table configuration with an index
· Upload the schema and table
· Verify the table and query it
How to Create Indexes?
Indexes in Apache Pinot are specified inside the "tableIndexConfig" object of the table configuration (table.json).
{ "tableName": "somePinotTable", "tableType": "OFFLINE", "segmentsConfig": { "schemaName": "someSchema", "timeColumnName": "eventTime", "timeType": "MILLISECONDS" }, "tableIndexConfig": { "bloomFilterColumns": ["playerID"], "sortedColumn": ["eventTime"], "invertedIndexColumns": ["category"], "rangeIndexColumns": ["price"], "starTreeIndexConfigs": [{ "dimensionsSplitOrder": ["Country", "Browser", "Locale"], "functionColumnPairs": ["SUM__Impressions"], "maxLeafRecords": 1 }] }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" } }
Indexes can be defined when creating a new table using the tableIndexConfig object inside the table configuration file.
If an index needs to be added or modified after the table is created, update the table configuration using Pinot's API.
Let’s follow below step-by-step procedure to define invertedIndex on country column.
Step 1: Define a Schema
A schema in Pinot describes the structure of the data, defining columns and data types.
country_schema.json
{ "schemaName": "country_schema", "dimensionFieldSpecs": [ { "name": "country", "dataType": "STRING" }, { "name": "city", "dataType": "STRING" } ], "metricFieldSpecs": [ { "name": "population", "dataType": "INT" } ], "dateTimeFieldSpecs": [ { "name": "eventTime", "dataType": "LONG", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:MILLISECONDS" } ] }
Let’s onboard this schema into Pinot using the Swagger User interface.
Open Pinot user interface by navigating to the url http://localhost:9000.
Click on ‘Swagger REST API’ option available at left navigation bar to open the Swagger User Interface.
Execute the API ‘POST /schemas’, with country_schema.json payload to onboard the schema.
Upon successful insertion of schema, you can see following response.
{ "unrecognizedProperties": {}, "status": "country_schema successfully added" }
Step 2: Define a Table with an Index
A table in Pinot is associated with a schema and contains configurations for indexing and ingestion.
country_table.json
{ "tableName": "country", "tableName": "country_schema", "tableType": "OFFLINE", "segmentsConfig": { "timeColumnName": "eventTime", "timeType": "MILLISECONDS", "retentionTimeUnit": "DAYS", "retentionTimeValue": "30", "schemaName": "country_schema", "segmentAssignmentStrategy": "Balanced", "replication": "1" }, "tableIndexConfig": { "invertedIndexColumns": ["country"] }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "metadata": {} }
Execute the API ‘POST /tables’ to onboard the tables.
Upon successful insertion on table definition, you can see below response.
{ "unrecognizedProperties": {}, "status": "Table country_schema_OFFLINE successfully added" }
Previous Next Home
No comments:
Post a Comment