Tuesday, 11 June 2024

Maximizing Efficiency, User Experience with Web Queue Worker Pattern

 

Web Queue Worker is a Software Design Pattern commonly used to handle Background, Asynchronous tasks. With the help of this pattern, we can decouple the task execution from the user request, and scale the application on demand.

 

Key Components of Web Queue Worker Pattern

1.   Queue: A queue is a data structure that holds tasks to be processed.

2.   Producer: A producer is typically the part of the web application/ Microservice that creates tasks and adds them to the queue.

3.   Consumer/Worker: The consumer is the worker that processes the tasks from the queue.

4.   Database: To capture the state, outcome of the submitted task.

 

Let me explain with an example. Imagine you are developing a Control Plane application for a Platform as a Service (PaaS) product, where you need to support the following operations.

 

a.   Scale Up,

b.   Scale Down,

c.    Provision New Instance, and

d.   Decommission.

 

Traditional Approach

1.   User Initiates Operation: The user sends a request to scale up a Cassandra instance.

2.   Application Processes Request: The web server begins the scaling process, which could take anywhere from 30 minutes to 10 hours approximately.

3.   User Waits: The user must wait for the operation to complete before receiving a response.

4.   Timeout Issues: Web servers and browsers have timeout limits. A task taking hours is likely to hit these limits, causing the operation to fail.

5.   Resource Blocking: The web server thread handling the request remains occupied, reducing the server's capacity to handle other requests.

6.   Poor User Experience: Long wait times and potential timeouts lead to a frustrating user experience.

 


As depicted in the diagram, the user submits a Scale Up request to the Control Plane. The Control Plane accepts the request, stores the metadata of this request in a database (marking the current operation state as ‘Accepted’), and simultaneously forwards the request to the Scale Up Service.

 

Problems with this Approach

a. What happens when the Scale Up Service is down or not responding?

If the Scale Up Service is unavailable or unresponsive, the Control Plane cannot proceed with the scaling operation. This results in a stalled or failed operation, leaving the user without a resolution.

 

b. Synchronous Interaction Issues

Since the interaction between the user and the Control Plane, and between the Control Plane and the Scale Up Service, is synchronous, this leads to several issues:

1.   Poor User Experience: Users experience long wait times while the operation is processed, leading to frustration.

2.   Resource Blocking: The web server thread handling the request remains occupied for the duration of the operation. This reduces the server’s capacity to handle other incoming requests, potentially leading to a bottleneck.

3.   Timeouts: Long-running operations are prone to hitting timeout limits set by web servers and browsers. When this happens, the operation fails, and the user is left without a successful outcome.

 

With Web-Queue-Worker Approach

The Web Queue Worker Pattern addresses these issues by decoupling the initiation of the operation from its execution. Here's how it works:

 

1.   User Initiates Operation: The user sends a request to scale up a Cassandra instance.

2.   Task Queued: The web server adds the scale-up task to a queue and immediately responds to the user, confirming that the request is received.

3.   Background Worker Processes Task: A background worker picks up the task from the queue and performs the scaling operation.

4.   User Updates: The user can periodically check the status of the operation or receive notifications upon completion.

 


In the diagram above, once the user submits a request, we store the request data in the database and simultaneously enqueue the task in a queue. Upon successful submission to the queue, we inform the user that their task has been submitted. We provide them with a status link where they can track the progress of their task. Refer this Article to model Asynchronous tasks https://restfulapi.net/http-status-202-accepted/

 

The Scale Up Service, acting as a worker, retrieves tasks from the queue, processes them, and updates the task status in the database accordingly. This approach effectively decouples the user's request submission from its execution. Additionally, it allows us to scale up worker services such as the Scale Up Service, Scale Down Service, Provision Service, and Decommission Services independently without affecting the Control Plane application when handling huge operational requests.

 

You might need to handle Dual Write problem here, refer this post for more details.

                                                                                System Design Questions

No comments:

Post a Comment