Wednesday, 6 August 2025

How to Install Apache Druid : A Step-by-Step Guide for Single-Server Quickstart

Apache Druid is a high-performance real-time analytics database built for fast slice-and-dice analytics ("OLAP" queries) on large data sets. If you're exploring real-time data ingestion, interactive dashboards, or need lightning-fast queries on time-series data, Druid is an excellent choice.

This guide walks you through setting up a Single-Server Druid instance on a macOS machine. This setup is great for development, testing, and local experimentation. Production-grade installations should use Druid’s clustered deployment architecture.

 

1. Prerequisites

Java 11 or later

 

2. Steps to Setup Druid

Step 1: Download Apache Druid

Visit the Apache Druid Downloads Page 9 https://druid.apache.org/downloads/) and grab the latest stable version (e.g., apache-druid-32.0.1-bin.tar.gz).

 

Step 2: Extract the Archive, and start the Druid Cluster (Single-Server Quickstart)

 

Run the following command:

bin/start-micro-quickstart

$bin/start-micro-quickstart
Druid requires Java 11 or 17. Your current version is: 21.0.4.

If you believe this check is in error, or you want to proceed with a potentially
unsupported Java runtime, you can skip this check using an environment variable:

  export DRUID_SKIP_JAVA_CHECK=1

Otherwise, install Java 11 or 17 in one of the following locations.

  * DRUID_JAVA_HOME
  * JAVA_HOME
  * java (installed on PATH)

Other versions of Java versions may work, but are not officially supported.

For more information about selecting a Java runtime visit:
https://druid.apache.org/docs/latest/operations/java.html

Oops, Druid latest version require Java11 or 17, but I have Java 21.

 

Let me ignore java version check by setting below environment variable.

export DRUID_SKIP_JAVA_CHECK=1

Let me start the druid again.

$export DRUID_SKIP_JAVA_CHECK=1
$
$
$bin/start-micro-quickstart
[Fri Apr 18 07:40:01 2025] Starting Apache Druid.
[Fri Apr 18 07:40:01 2025] Open http://localhost:8888/ in your browser to access the web console.
[Fri Apr 18 07:40:01 2025] Or, if you have enabled TLS, use https on port 9088.
[Fri Apr 18 07:40:01 2025] Starting services with log directory [/Users/h0g01ex/Documents/technical-documents/bigdata/softwares/apache-druid-32.0.1/log].
[Fri Apr 18 07:40:01 2025] Running command[zk]: bin/run-zk conf
[Fri Apr 18 07:40:01 2025] Running command[coordinator-overlord]: bin/run-druid coordinator-overlord conf/druid/single-server/micro-quickstart
[Fri Apr 18 07:40:01 2025] Running command[broker]: bin/run-druid broker conf/druid/single-server/micro-quickstart
[Fri Apr 18 07:40:01 2025] Running command[router]: bin/run-druid router conf/druid/single-server/micro-quickstart
[Fri Apr 18 07:40:01 2025] Running command[historical]: bin/run-druid historical conf/druid/single-server/micro-quickstart
[Fri Apr 18 07:40:01 2025] Running command[middleManager]: bin/run-druid middleManager conf/druid/single-server/micro-quickstart

Open the url http://localhost:8888/ in browser, to see Apache Druid Console.

 


3. start-micro-quickstart

Let’s understand what happened when I ran the start-micro-quickstart command.

 

When you start Druid with start-micro-quickstart, it spins up a mini cluster on your local machine. Even though it's just one machine, Druid still follows a distributed architecture, so each service runs as a separate process.

 

3.1 zk: Zookeeper

[Fri Apr 18 07:40:01 2025] Running command[zk]: bin/run-zk conf

 

Zookeeper is a coordination service. Druid uses it to handle following operations.

·      Manage cluster metadata (which node is doing what)

·      Handle leader election (e.g., who is the master coordinator?)

 

3.2 coordinator-overlord: Management Processes

[Fri Apr 18 07:40:01 2025] Running command[coordinator-overlord]: bin/run-druid coordinator-overlord ...

 

Druid combines two critical services into one process:

 

Coordinator

·      Responsible for data management on the Historical nodes

·      Tells which data segments should be loaded/unloaded

·      Ensures replication, availability, and data balancing

 

Overlord

·      Manages task-based ingestion via MiddleManagers

·      Accepts ingestion specs, creates tasks, and monitors them

 

3.3 broker: Query Dispatcher

[Fri Apr 18 07:40:01 2025] Running command[broker]: bin/run-druid broker ...

 

Receives queries from clients (like the Druid Console or your app), figures out which nodes have the needed data, Distributes the query to the correct Historical/MiddleManager nodes and merges results and returns a final response.

 

3.4 router: API Gateway and UI

[Fri Apr 18 07:40:01 2025] Running command[router]: bin/run-druid router ...

 

Acts as a reverse proxy that routes traffic to:

·      Coordinator/Overlord UI

·      Broker for queries

·      Druid console (localhost:8888)

·      Useful in production setups where you want to expose only one endpoint

 

You can correlate it like an API Gateway that sends your request to the right backend service.

 

3.5 historical: Immutable Data Store

[Fri Apr 18 07:40:01 2025] Running command[historical]: bin/run-druid historical ...

 

·      Stores read-only, queried data in the form of segments

·      Very fast for historical data queries

·      Used for querying large amounts of past data efficiently

 

3.6 middleManager: Ingestion Engine

[Fri Apr 18 07:40:01 2025] Running command[middleManager]: bin/run-druid middleManager ...

 

·      Handles data ingestion into Druid

·      Runs tasks (in separate threads or JVMs) to pull, parse, and push data

·      Sends processed segments to Historical nodes

 

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment