Apache Pinot is a real-time OLAP datastore that organizes data efficiently for fast analytics. In this post, I am going to explain the core components of Apache Pinot.
1. Table
A table in Apache Pinot is a logical container for your data, similar to a table in a traditional database. It defines how the data is stored, queried, and indexed.
1.1 Types of Tables in Pinot
Pinot supports three types of tables based on how data is ingested:
Table Type |
Description |
Best suited for |
Realtime Table |
Stores data coming from streaming sources like Kafka or Kinesis |
Live dashboards, real-time monitoring
|
Offline Table |
Stores batch data from sources like Hadoop, S3, or GCS |
Historical data analysis |
Hybrid Table |
A combination of both real-time and offline tables |
Unified view of both real-time and historical data |
For example, If you are tracking user clicks on a website, you can use a Realtime Table to get fresh events instantly and an Offline Table to store older data for long-term analysis.
2. Segments
A segment is a small chunk of data stored in a Pinot table. Instead of keeping all the table data in one huge file, Pinot breaks it into segments for better performance and faster queries.
2.1 Why does Pinot use Segments?
· Faster Queries: Queries only scan relevant segments instead of the whole table.
· Efficient Storage: Data is compressed and indexed for quick lookups.
· Scalability: New segments can be added without affecting existing ones.
· Parallel Processing: Different servers can handle different segments for faster results.
2.2 How Tables and Segments Work Together?
Imagine you are tracking customer purchases:
Table: "customer_purchases" (stores all purchase data).
Segments:
· "customer_purchases_2024-03-01" (purchases from March 1, 2024).
· "customer_purchases_2024-03-02" (purchases from March 2, 2024).
· "customer_purchases_live" (real-time purchases happening now).
When a query runs, Pinot scans only relevant segments instead of searching the entire dataset, making queries super fast.
3. Schema
A schema in Apache Pinot defines the structure of a table. It specifies:
· Column Names: The attributes that will store data.
· Data Types: The format in which data is stored (e.g., INT, STRING, DOUBLE).
· Field Types: The role of each column (dimension, metric, datetime, complex).
· Storage and Indexing Details: How data is organized and retrieved efficiently.
A schema is essential for both:
· Offline Tables (batch ingested from sources like Hadoop, S3, etc.).
· Real-time Tables (ingested from streaming sources like Kafka).
For hybrid tables, both offline and real-time components should use the same schema to ensure consistency.
A Pinot schema consists of the following main sections:
a. schemaName: The name of the schema, which should ideally match the table name.
"schemaName": "customer_transactions"
b. dimensionFieldSpecs (Dimensions): Used for filtering, grouping, and categorization of data.
Characteristics:
· Contains categorical or descriptive information (e.g., customer_id, city).
· Can be single-value (default) or multi-value (lists/arrays).
· Requires a default null value for missing data.
{ "dimensionFieldSpecs": [ { "name": "customer_id", "dataType": "STRING", "defaultNullValue": null }, { "name": "product_category", "dataType": "STRING", "defaultNullValue": null }, { "name": "tags", "dataType": "STRING", "singleValueField": false, "defaultNullValue": null } ] }
In this example,
· customer_id and product_category are single-value dimensions.
· tags is a multi-valued dimension, meaning it can hold multiple values.
c. metricFieldSpecs (Metrics): Stores numerical values used in aggregations (SUM, AVG, COUNT).
Characteristics:
· Contains numerical fields like sales_amount, clicks, views.
· Requires a default null value (like 0 for counts).
{ "metricFieldSpecs": [ { "name": "sales_amount", "dataType": "DOUBLE", "defaultNullValue": 0.0 }, { "name": "items_sold", "dataType": "INT", "defaultNullValue": 0 } ] }
Here,
· sales_amount: Stores floating point values for revenue.
· items_sold: Stores integer values representing item count.
d. dateTimeFieldSpecs (Time-based Fields): Stores time-based information (timestamps, dates).
Characteristics:
· Defines the format (EPOCH, SIMPLE_DATE_FORMAT, TIMESTAMP).
· Defines granularity (second, minute, day).
· Used for time-based queries and data retention policies.
{ "dateTimeFieldSpecs": [ { "name": "event_time", "dataType": "LONG", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:DAYS" } ] }
e. complexFieldSpecs (Complex Data Types): Defines MAP structures for key-value data.
Characteristics:
· Key is always a STRING.
· Value can be a supported data type (STRING, INT, etc.).
{ "complexFieldSpecs": [ { "name": "customer_preferences", "dataType": "MAP", "fieldType": "COMPLEX", "childFieldSpecs": [ { "name": "preference_key", "dataType": "STRING", "fieldType": "DIMENSION" }, { "name": "preference_value", "dataType": "STRING", "fieldType": "DIMENSION" } ] } ] }
Above snippet stores customer preferences as key-value pairs.
Following table summarizes the different data types supported by Apache Pinot.
Data Type |
Description |
INT |
Integer values (e.g., 10, 2000) |
LONG |
Large integer values (e.g., 123456789) |
FLOAT |
Floating-point numbers (1.23) |
DOUBLE |
Double precision floating-point numbers (1234.5678) |
BIG_DECIMAL |
Arbitrary precision decimal numbers |
BOOLEAN |
Stores true or false values |
TIMESTAMP |
Stores timestamp values |
STRING |
Textual data ("Hello World") |
JSON |
Stores JSON objects |
BYTES |
Stores binary data (e.g., images, encoded data) |
Handling Special Cases
· Negative Zero (-0.0): Converted to positive zero (0.0).
· NaN (Not a Number): Converted to default null value.
· BIG_DECIMAL: Removes trailing zeros (100.00 → 100).
Handling Null Values in Pinot
Pinot does not store nulls explicitly, instead the defaultNullValue is used.
{ "name": "user_age", "dataType": "INT", "defaultNullValue": -1 }
If user_age is missing, it defaults to -1. Since Pinot 1.1.0, enableColumnBasedNullHandling allows finer control over null values.
Built-in Virtual Columns
Pinot provides virtual columns for debugging:
Virtual Column |
Purpose |
$hostName |
Server hosting the data |
$segmentName |
Segment containing the record |
$docId |
Unique document ID |
These can be used in queries like below.
SELECT $segmentName, customer_id FROM customer_transactions;
4. Servers
A Server in Apache Pinot is responsible for storing and processing data segments. It handles queries by scanning the relevant data and returning the results.
Key Responsibilities of a Server
· Data Storage: A server stores data segments, which are units of partitioned data in Pinot. These segments can be generated from batch ingestion (e.g., Hadoop, S3, GCS) or real-time ingestion (e.g., Kafka, Pulsar).
· Query Execution: When a broker sends a query, the server scans the relevant segments and processes the query. It performs operations like filtering, aggregations, and transformations before returning results to the broker.
· Segment Management: The server is responsible for loading, unloading, and managing the lifecycle of data segments. Pinot supports segment replication, so multiple servers may store copies of the same segment for fault tolerance and load balancing.
· Resource Allocation: Pinot allows separating servers based on workload. This separation allows for better resource optimization.
o Offline Servers: Handle batch ingested data (historical data).
o Realtime Servers: Handle streaming data (real-time ingestion).
Pinot servers are horizontally scalable. More servers can be added to handle larger datasets and higher query loads. Since data is stored in segments, new servers can host additional segments without affecting existing nodes.
5. Brokers
A Broker in Apache Pinot is responsible for query routing and result aggregation. It provides a REST API endpoint for clients to execute queries and directs the queries to appropriate servers.
Key Responsibilities of a Broker
· Query Interface: Brokers expose a REST API that allows clients to send SQL or PQL (Pinot Query Language) queries. Users interact with Pinot through brokers, not directly with servers.
· Query Routing (Scatter-Gather): When a query arrives, the broker determines which servers store the relevant data segments. The broker forwards the query to multiple servers (scatter phase). Each server processes the query on its respective segments and returns partial results.
· Result Aggregation: The broker combines the partial results from different servers (gather phase). It applies final sorting, filtering, and aggregation before sending the final result back to the client.
· Routing Table Management: Brokers maintain a routing table that maps which servers hold which segments. This routing table is updated dynamically as segments move or new servers are added.
Just like Servers, Brokers can also be horizontally scaled. More brokers can be added to handle higher query volumes. Brokers do not store data, so they can be independently scaled based on query traffic.
How Servers and Brokers Work Together?
· Client submits a query to the broker.
· Broker looks up the routing table to find relevant servers.
· Broker forwards the query to selected servers.
· Servers process the query on their segments and return partial results.
· Broker aggregates and returns the final result to the client.
In the above example,
· Client Submits a Query: The end-user (Client) sends a query to the Broker to fetch specific data.
· Broker Checks the Routing Table: The Broker consults the Routing Table, which maps which servers hold the required file segments. The table shows that Server 2 and Server 3 contain the relevant data.
· Broker Forwards the Query: The Broker sends the query only to Server 2 and Server 3.
· Servers Process the Query: Server 2 and Server 3 process the query on their respective segments and generate partial results.
· Servers Return Partial Results: Both servers send their partial results back to the Broker.
· Broker Aggregates and Sends the Final Result: The Broker combines the results from Server 2 and Server 3. It processes and formats the final result before sending it back to the Client.
· Client Receives the Final Output.
6. Controllers
A Pinot Controller is a core component responsible for managing the metadata, configurations, and overall cluster operations in Apache Pinot. It plays a crucial role in coordinating the storage and query layers by ensuring proper table and segment management.
Key Responsibilities of a Pinot Controller
· Cluster Management with Helix: Apache Pinot uses Apache Helix for cluster management. Helix ensures that the Pinot cluster remains in a healthy state by monitoring nodes and automatically recovering from failures. The controller interacts with Helix to manage cluster configurations and assignments.
· Storing Metadata in ZooKeeper: The Pinot Controller relies on Apache ZooKeeper to store and manage metadata like Cluster state information (available brokers, servers, controllers), Table configurations and schemas, Segment locations and assignments, and Leader election and cluster consistency etc., ZooKeeper acts as a persistent metadata store, ensuring that all nodes in the cluster have up-to-date information.
· Leader Election and High Availability: Pinot Controllers are horizontally scalable, meaning multiple controllers can be deployed for fault tolerance. However, at any given time, only one controller acts as the leader, elected through Apache Helix. The leader controller is responsible for managing table creation, segment assignments, and retention policies. If the leader controller fails, Helix automatically elects a new leader from the available controllers.
· Segment and Table Management: The controller plays a critical role in handling Table creation and updates, assigning segments to real-time or offline servers, Retention policies to delete old segments. Schema validation and enforcement. Without a running controller, new tables and segments cannot be created or modified.
· Query Processing Independence: One of Pinot's strengths is that query processing does not depend on controllers. If all controllers go down, brokers and servers continue to function normally, and queries can still be executed. However, for administrative tasks like table creation, segment assignment, and retention enforcement, at least one controller must be active.
Brokers in Pinot do not maintain table-to-segment mappings independently. Instead, they fetch this metadata from controllers, which retrieve it from ZooKeeper. This ensures that brokers always have the latest information on where each segment is stored and how to route queries efficiently.
What Happens When a Controller Fails?
Since controllers are horizontally scalable, failure of one controller does not disrupt the cluster. However, if all controllers fail simultaneously:
· Query execution remains unaffected since brokers and servers continue working.
· New table creation, schema updates, and segment assignments will be blocked until at least one controller is restored.
7. Tenants
A tenant in Apache Pinot represents a logical grouping of resources, allowing multiple use cases to coexist in the same cluster while remaining isolated from one another. This helps in:
· Ensuring fair resource allocation among different teams.
· Preventing noisy neighbor problems, where one workload might consume excessive resources and impact others.
· Simplifying administration by logically grouping brokers and servers for different use cases.
Types of Tenants in Apache Pinot
Apache Pinot supports two types of tenants:
· Broker Tenant: A group of brokers dedicated to handling queries for a particular use case.
· Server Tenant: A group of servers responsible for storing and serving data for a particular use case.
This separation ensures that query execution and data storage for different applications or teams are managed independently.
7.1 How Tenants Work in Pinot
Each Pinot server and broker can be tagged with a tenant name, which ensures that they serve only the tables and queries related to their assigned tenant.
Tagging Brokers and Servers with Tenants: When setting up a Pinot cluster, admins can tag specific brokers and servers with tenant names. These tags ensure that queries from a particular tenant are handled only by designated brokers and servers.
For example,
· Marketing Team: marketingBroker, marketingServer
· Product Analytics Team: productBroker, productServer
Assigning Tenants to Tables: When creating a table in Pinot, admins specify which broker and server tenants should handle it. This ensures that each table’s data is processed and queried only within its assigned tenant.
Example table configuration:
{ "tableName": "user_events", "tenant": "marketingTenant", "tableType": "REALTIME" }
Resource Quotas for Tenants: Pinot allows quotas to be set on tenants to prevent excessive resource consumption.
· Segment Quotas: Limits the total storage used by a tenant.
· Query Quotas: Monitors query execution limits (though enforcement is handled externally).
This helps ensure fair usage and prevents overloading the cluster.
What Happens If Tenants Are Not Used?
Without tenant isolation:
· One team’s heavy queries may impact the performance of another team’s reports.
· Storage consumption may become unmanageable without segment quotas.
· Cluster administration becomes more complex, leading to inefficient resource allocation.
8. Cluster
Pinot operates as a cluster-based system, which consists of different components working together.
A Pinot Cluster is composed of the following key components:
· Controllers
· Brokers
· Servers
· Minions
Each of these components has a specific role in managing, storing, and querying data.
A Pinot cluster is logically divided into three categories based on their function:
· Participants: Participants are responsible for storing and processing data. Pinot Servers fall under this category.
· Spectators (Observe Participants and Query Data): Spectators do not store data, but they observe participants and fetch data from them when required. Pinot Brokers fall under this category. Brokers receive queries from users, determine which servers store the necessary data, and route the queries accordingly. Brokers return the final aggregated result to the user.
· Controllers (Manage the Cluster): Controllers manage and coordinate the Pinot cluster. They are responsible for metadata storage, table management, segment assignments, and task execution. Controllers act as leaders, managing the state of participants (servers) and ensuring stability. If the leader controller fails, another one is automatically elected to take over.
Additional Components
Minions: These are used for background tasks like segment merging, purging, and complex data transformations. They offload expensive operations from Pinot servers to maintain high query performance. For example, If raw data needs to be pre-processed before storage, Minions handle this in the background.
Previous Next Home
No comments:
Post a Comment