Sunday, 10 August 2025

Druid Architecture

Druid uses a distributed system, which means it runs on a group of computers that work together. It's built to work well in the cloud and is designed to be easy to manage. You can adjust or grow different parts of the system separately, which gives you a lot of control and flexibility.

Druid is also built to be fault-tolerant, if one component stops working, the rest of the system can keep running without crashing.

 

The diagram below shows the different parts of Druid, how they are usually set up across machines, and how data is stored, and queries are executed in the system.

 


Druid Services

Druid is made up of different parts, and each part has a special job:

 

·      Coordinator: Makes sure that the data is available on the system.

·      Overlord: Controls the assignment of data ingestion workloads.  In simple terms, it manages the process of bringing new data into Druid.

·      Broker: Handle Queries from external clients

·      Router: Routes requests to Brokers, Coordinators, and Overlords. Historical: Stores data that’s ready to be searched and used in queries.

·      Middle Manager + Peon: Work together to bring in and process new data.

·      Indexer: A newer option that can do the same work as the Middle Manager and Peon, but in a simpler way.

 

You can view services in the Services tab in the web console:

 


1. Master Server

In a Druid cluster, a Master Server is not a single machine, but rather a logical role performed by two important services:

 

·      Coordinator Service

·      Overlord Service

 

Together, these two "brain-like" components manage the whole cluster's behavior. They don’t process queries or store data directly, instead, they make decisions about where and how data is stored and how new data should be ingested.

 

1.1 Coordinator Service ("The Segment Manager")

 

The Coordinator is a Master Server process responsible for the management and optimization of data availability within the Druid cluster. Its key responsibilities include:

 

·      Segment Management: Tracking the location and health of data segments (the fundamental storage unit in Druid) across the Historical nodes.

·      Load Balancing: Ensuring that data segments are evenly distributed across the Historical nodes to optimize query performance and resource utilization.

·      Data Tiering: Managing the movement of segments between different storage tiers based on configured rules (e.g., moving older, less frequently accessed data to cheaper storage).

·      Compaction: Initiating tasks to merge smaller segments into larger ones to improve query efficiency.

·      Data Retention: Enforcing data retention policies by identifying and marking old segments for deletion.

·      The Coordinator communicates with the Historical nodes to assign and manage segments. Typically, in a production environment, you would have one active Coordinator and one or more standby Coordinators for high availability.

 

1.2 Overlord Service:

The Overlord is another Master Server process that manages and orchestrates data ingestion tasks. Its main responsibilities include:

 

·      Task Management: Accepting and managing the lifecycle of ingestion tasks submitted by clients or other Druid processes (like MiddleManagers).

·      Task Assignment: Assigning ingestion tasks to available MiddleManager nodes (or Peons/Indexers in other ingestion models).

·      Task Monitoring: Tracking the progress and status of running ingestion tasks.

·      Coordination of Segment Publishing: Ensuring that newly created data segments are properly published to deep storage and made available for querying by the Historical nodes (informing the Coordinator about new segments).

·      Similar to the Coordinator, you would typically have an active Overlord and standby Overlords for fault tolerance.

 

2. Query Server

In Druid, Query Server refers to the components that receive, route, and respond to user queries. These are the services that power analytics by turning questions into answers.

 

There are two main services that play a role in the query layer.

 

·      Broker Service (core Query Server)

·      Router Service (optional Query Gateway)

 

Historical and MiddleManager Services (data servers that process parts of the query — not considered Query Servers, but they help fulfill queries)

 

2.1 Router Service:

The Router is an optional Query Server process that provides a unified API gateway in front of the Broker, Coordinator, and Overlord services. Its benefits include:

 

·      The Router is an optional Query Server process that provides a unified API gateway in front of the Broker, Coordinator, and Overlord services. Its benefits include:

·      Simplified Client Interaction: External clients only need to interact with the Router's endpoint, which then handles routing the requests to the appropriate backend service. This simplifies client configurations and can provide a more stable API.

·      Load Balancing: The Router can perform basic load balancing across multiple Broker, Coordinator, and Overlord instances.

·      Security and Authentication: The Router can act as a central point for implementing security measures like authentication and authorization.

·      While optional, Routers are highly recommended for larger, production deployments as they enhance manageability, scalability, and security.

 

The Router service also runs the web console, a UI for loading data, managing datasources and tasks, and viewing server status and segment information.

 

2.2 Broker Service

The Broker is a Query Server process that acts as the entry point for external client queries. Its primary responsibilities include:

 

·      Query Reception: Receiving queries (typically in Druid’s JSON-based query language or SQL) from applications and users.

·      Query Planning and Distribution: Understanding the query and determining which Historical nodes (and potentially MiddleManagers for real-time data) contain the necessary data segments.

·      Query Routing: Forwarding sub-queries to the relevant data nodes (Historicals and MiddleManagers).

·      Result Merging and Aggregation: Collecting the results from the data nodes, performing any final merging or aggregation required by the original query, and returning the final result to the client.

·      Brokers maintain a cache of segment metadata to efficiently route queries. You can scale out Broker services to handle a higher volume of concurrent queries.

 

3. Data Server

A Data server executes ingestion jobs and stores queryable data. Data servers divide operations between Historical and Middle Manager services.

 

3.1 The MiddleManager service

It is a Data Server process that handles the active ingestion of data. Its primary responsibilities include:

 

·      Task Execution: Receiving ingestion tasks from the Overlord and spawning "Peon" processes (or running tasks as threads in "Indexer" service) to execute these tasks.

·      Data Processing: Reading data from the configured input source, transforming it according to the ingestion specification, and creating new Druid segments.

·      Temporary Storage: Using local disk for temporary storage during the data processing and segment creation phases.

·      Segment Hand-off: Once a segment is built, the MiddleManager coordinates the hand-off of the segment to deep storage and informs the Overlord about the completed task.

·      MiddleManagers are scalable, and you add more of them to increase your data ingestion throughput.

 

3.2 Historical Service:

Historical services handle storage and querying on historical data, including any streaming data that has been in the system long enough to be committed. Historical services download segments from deep storage and respond to queries about these segments. They don't accept writes.

 

Its key characteristics include:

·      Segment Storage: Downloading and storing data segments from deep storage on its local disk.

·      Query Processing: Directly processing the parts of incoming queries that pertain to the data segments it holds.

·      No Writes: Historical nodes do not handle data ingestion (writes); they only serve read requests.

·      Historical nodes are designed to be scalable, and you add more of them to increase your capacity for storing and querying historical data. They are typically deployed on machines with significant disk space and memory (for caching frequently accessed data).

 

 

At high level

·      Coordinator manages data availability on the cluster.

·      Overlord controls the assignment of data ingestion workloads.

·      Broker handles queries from external clients.

·      Router routes requests to Brokers, Coordinators, and Overlords.

·      Historical stores queryable data.

·      Middle Manager and Peon ingest data.

 

In summary, Druid's architecture is designed for distributed, high-performance analytics with a clear separation of concerns among its different service types. This modularity allows for independent scaling and fault tolerance of each component, making Druid well-suited for handling large, real-time datasets and serving low-latency queries.

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment