In the world of big data, real-time analytics is becoming very important. Apache Druid is a high-performance analytics database designed for fast slice-and-dice analytics on large datasets. Apache Kafka is a widely used distributed event streaming platform.
In this post, I will walk you through the concepts, setup, and step-by-step instructions to get your Kafka data flowing into Druid for real-time querying.
1. Setup Kafka
Apache Kafka is a powerful tool for building real-time data pipelines. It works like a high-speed messaging system, collecting data from various sources and passing it along to systems that process or analyze it.
Think of Kafka as a post office for data:
· Producers drop off messages (e.g., a user clicked a button).
· Kafka stores and delivers these messages.
· Consumers pick up the messages and do something with them, like store them in a database, trigger alerts, or update dashboards.
Kafka is used by companies like LinkedIn, Netflix, and Uber to handle millions of messages per second, making it a perfect fit for real-time data streaming.
Follow step-by-step procedure to setup Kafka.
Step 1: Go to the following link and setup Kafka.
https://kafka.apache.org/downloads
At the time of writing this post, 4.0.0 is the latest version.
Step 2: Generate a Cluster UUID by executing following statement.
Navigate to Kafka home directory and execute below command.
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
$KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)" $ $echo $KAFKA_CLUSTER_ID ImA96y2CTJGN-HzF3GSFWg
Step 3: Format log directories.
bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/server.properties
$bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/server.properties Formatting dynamic metadata voter directory /tmp/kraft-combined-logs with metadata.version 4.0-IV3.
Step 4: Start the Kafka Server
bin/kafka-server-start.sh config/server.properties
Once the Kafka server has successfully launched, you will have a basic Kafka environment running and ready to use.
Step 5: Create a Kafka Topic for Order Events
Let’s create a Kafka topic to hold incoming order events from our e-commerce platform. These could be events like a customer placing an order, the order being shipped, or the order getting canceled.
Run the following command to create a topic named ecommerce-orders:
bin/kafka-topics.sh --create --topic ecommerce-orders --bootstrap-server localhost:9092
$ bin/kafka-topics.sh --create --topic ecommerce-orders --bootstrap-server localhost:9092 Created topic ecommerce-orders.
Step 6: Produce Some Sample Order Events
Now let’s simulate a few order events using kafka-console-producer.sh. These JSON messages represent individual e-commerce transactions.
Each message includes:
· order_id: Unique ID for the order
· user_id: ID of the customer
· product_id: ID of the product purchased
· amount: Order total in USD
· order_status: Current status (e.g., placed, shipped, delivered)
· timestamp: Order time in epoch milliseconds (used by Druid for ingestion)
Send a sample order to Kafka
echo '{"order_id":"ORD12345", "user_id":"U001", "product_id":"P1001", "amount":49.99, "order_status":"placed", "timestamp":1714567890123}' | bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic ecommerce-orders
You can send few more.
echo '{"order_id":"ORD12346", "user_id":"U002", "product_id":"P1002", "amount":89.50, "order_status":"shipped", "timestamp":1714567900000}' | bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic ecommerce-orders echo '{"order_id":"ORD12347", "user_id":"U001", "product_id":"P1003", "amount":15.75, "order_status":"delivered", "timestamp":1714567910000}' | bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic ecommerce-orders
Step 7: Consume the Messages from the Topic
Let’s verify that Kafka is receiving these messages. Use the consumer to read from the topic:
bin/kafka-console-consumer.sh --topic ecommerce-orders --from-beginning --bootstrap-server localhost:9092
$ bin/kafka-console-consumer.sh --topic ecommerce-orders --from-beginning --bootstrap-server localhost:9092 {"order_id":"ORD12345", "user_id":"U001", "product_id":"P1001", "amount":49.99, "order_status":"placed", "timestamp":1714567890123} {"order_id":"ORD12346", "user_id":"U002", "product_id":"P1002", "amount":89.50, "order_status":"shipped", "timestamp":1714567900000} {"order_id":"ORD12347", "user_id":"U001", "product_id":"P1003", "amount":15.75, "order_status":"delivered", "timestamp":1714567910000}
2. Configure Druid to consume from ecommerce-orders topic
Login to Druid console (http://localhost:8888/).
Click on Load data -> Streaming
Select ‘Apache Kafka’ and click on Connect data button.
Give Bootstrap server as localhost:9092, topic name as ecommerce-orders and click on Apply button.
Once you click on Apply button, Druid start polling the data from Kafka topic.
Click on Parse data button available at bottom right corner.
Click on Parse time button.
Here you can observe __time column took timestamp field of the records to segment the data.
Subsequently click on Transform, Filter, Configure Schema, Partition buttons.
In the Partition tab, select Segment granularity as day.
Click on Tune button. In the Tune tab, set ‘Use earliest offset’ as True.
Click on Publish button, followed by Edit spec.
In the Edit spec tab, click on Submit supervisor (running) button.
You can observe ecommerce-order task is submitted and in PENDING state.
Wait for some time and click on Refresh button. You can observe that the task is moved to RUNNING state.
Navigate to Druid Query tab and execute following query to print all the records of the datasource ecommerce-orders
SELECT * FROM "ecommerce-orders"
Let’s add one more event to the ecommerce-orders topic.
echo '{"order_id":"ORD12347", "user_id":"U003", "product_id":"P1004", "amount":99.50, "order_status":"shipped", "timestamp":1714567990900}' | bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic ecommerce-orders
Navigate to Druid Query tab and rerune the sql query, you can see new event is pulled by Druid.
That’s it, Happy learning….:)
Previous Next Home
No comments:
Post a Comment