In this post, I am going to explain what is rate limiting, why rate limiting is used, and what are the different strategies/techniques used for rate limiting.
What is rate limiting?
Rate limiting is a strategy to limit the network traffic. Rate limiting protect the resources from excessive use. Rate limiting can be applied at both client and server side.
When a client receive rate limiting message or error from the server, Client retry the request after certain delay. As a best practice, client can increase this delay exponentially after each failed request.
Example of server side rate limiting
we can rate limit the Rest APIs to protect the underlying database.
Example of Client side rate limiting
Assume you developed a desktop client to seamlessly collaborate, synchronize and store documents between remote server and your computer. You should apply rate limiting on upload and download speeds at the client application level, otherwise this application end up in using all your internet bandwidth.
Example of rate limiting in a distributed computing
You can apply rate limiting to evenly distribute the work between multiple worker nodes. This prevent a worker node from accumulating a huge unprocessed items from the queue.
Why rate limiting?
a. Reduce load on the servers
b. Constrain some malicious activity like DoS (Denial of Service) and DDoS (Distributed Denial of Service) attacks.
c. Improve the APIs availability by avoiding resources starvation
d. Rate limiting can be used to control the cost, for example, my product XYZ has monthly budget of 10000$, then I can use rate limiting to run my application with in this budget.
What HTTP status code I can use to convey rate limiting errors?
HTTP status code 429 Too Many Requests, can be used to convey rate limiting errors. While sending the response, server can optionally send a Retry-After header might be included to this response indicating how long to wait before making a new request.
Example
HTTP/1.1 429 Too Many Requests Content-Type: text/html Retry-After: 1000
Following are the widely used rate limiting techniques.
a. Fixed window algorithm
In this technique we rate limit number of requests per certain fixed duration like 100 requests/minute, 5000 requests/hour etc., This algorithm is easy to implement.
One of the major drawback of this algorithm is, this leads to spikes at the edges of the window.
For example, we configured 5000 requests/hour, there is a possibility that all the 5000 requests are made in first 10 minutes and served. This might overwhelm the service and service will send rate limiting error in next 50 minutes (even though there is not much load on the server).
Another possibility can be , there is a possibility that all the 5000 requests are made in last 10 minutes (5.50 to 6.00) of the hour for this window, and all the 5000 requests are made the first 10 minutes of the next window (6.00 to 6.10). That means you are getting 10000 requests in 20 minutes (5.50 to 6.10).
Sliding window algorithm
It solves the boundaries problem in fixed window algorithm. Let me explain it with an example, Suppose you configured 50 requests/minute sliding algorithm.
Whenever a request comes to the server, this algorithm checks the request count in last minute (not the fixed minute). It check how many requests are there is last 1 minute, if the number of requests are < 50, then this request is processed, else failed with some rate limiting error.
Token bucket algorithm
A token bucket add or generate tokens at some fixed rate. For example, I can create a token bucket of size 10 and a refill rate of 1 per 10 seconds. This refill happen only when the number of token in the bucket are < 10.
Let us try to understand this with below example.
Time Requests AvailableTokens 5:00:00 0 10 (we have 10 tokens initially) 5:00:10 0 10 (refill attempt failed cause Bucket is full) 5:00:20 0 10 (refill attempt failed cause Bucket is full) 5:00:30 0 10 (refill attempt failed cause Bucket is full) 5:00:40 0 10 (refill attempt failed cause Bucket is full) 5:00:50 0 10 (refill attempt failed cause Bucket is full) 5:00:58 8 2 (10 - 8) 5:01:00 0 3 (refill 1 token) 5:01:10 0 4 (refill 1 token) 5:01:20 0 5 (refill 1 token) 5:01:30 0 6 (refill 1 token) 5:01:49 4 2 (6 - 4) 5:01:50 0 3 (refill 1 token)
Leaky bucket algorithm
Leaky bucket is easy to implement. In this algorithm rate is limited by size of the queue. If the size of queue is set to 5, at max 5 requests can be sent to this service
When a new request received by this API and the queue is full, this API throw the error with response code 429 (rate limiting error).
Previous Next Home
No comments:
Post a Comment