pwshub.com

Rate limiting vs. throttling and other API traffic management

It’s safe to assume that the average human understands how infrastructure like a traffic light works. However, we mostly underestimate infrastrcuture’s importance. Roads without traffic systems would be in complete chaos as the average driver lacks lane discipline, and in many cities, the road network can’t even keep up with population growth.

Black networking icon over orange textured background

This analogy extends to API design. Just as undisciplined drivers can cause havoc on the roads, malicious users can threaten your application. Plus, as your user base grows, managing traffic surges becomes essential. One effective method to handle this, like traffic lights for roads, is rate limiting.

In this tutorial, we’ll explore rate limiting vs. throttling and other API traffic management techniques. We’ll cover how they work, how to implement them, when to use each strategy, and provide a comparison table to help you decide which approach suits your needs best.

What is rate limiting?

Rate limiting is a technique for controlling the amount of incoming and outgoing traffic on an API by setting a predefined limit on how many requests an API user can make within a given timeframe. This way, you can prevent a single user from monopolizing your API infrastructure resources and simultaneously prevent malicious attacks such as denial-of-service (DoS) and brute-force attacks.

Behind the scenes, rate limiting is implemented with a rate limiter that constantly checks each user’s request to see if it is within their request limit or in excess:

A flowchart shows a rate limiter process for defining when a user can proceed.

Like in the image above, the request is processed if the user is within their limit, and their new limit is updated. However, the request is denied if they have already exceeded their limit. Furthermore, how these limits are capped and how often they are replenished depend on your organization’s preferences, which are, in turn, influenced by your system capacity and business requirements.

Implementing rate limiting

In practice, rate limiting can be implemented using various algorithms, each with its own method for managing request rates. Some popular ones include the following:

  • >Token Bucket — This algorithm uses a “bucket” with a fixed number of tokens. Each request removes a token from the bucket; if the bucket is empty, the request is denied.
  • >Leaky Bucket — Similar to the token bucket, however, in this one, tokens (or requests) are processed at a constant rate. If the incoming request rate exceeds the processing rate, the excess requests are discarded or delayed.
  • >Fixed Window — Requests are counted and limited within fixed time windows (e.g., per minute or per hour). If the number of requests exceeds the limit within the window, further requests are denied until the next window.
  • Sliding Log — This one logs each request with a timestamp. To determine if a new request is allowed, the algorithm checks the log for the number of requests within the allowed time frame and makes a decision based on that count.

For example, we can utilize the pattern shown in the code below to implement rate limiting using the token bucket algorithm:

class TokenBucket {
    constructor(rate, capacity) {
        this.rate = rate;
        this.capacity = capacity;
        this.tokens = capacity;
        this.lastRequestTime = Date.now();
    }
    addTokens() {
        const now = Date.now();
        const elapsed = (now - this.lastRequestTime) / 1000; // Convert to seconds
        const addedTokens = elapsed * this.rate;
        this.tokens = Math.min(this.capacity, this.tokens + addedTokens);
        this.lastRequestTime = now;
    }
    allowRequest(tokensNeeded = 1) {
        this.addTokens();
        if (this.tokens >= tokensNeeded) {
            this.tokens -= tokensNeeded;
            return true;
        } else {
            return false;
        }
    }
}

In this code sample, we define a TokenBucket class that sets the token generation rate and capacity and records the last request time; we then create an addTokens() method that calculates the number of tokens to add based on the time since the last request and additionally updates the user’s current token count. Finally, we define an allowRequest() method that checks whether there are enough tokens for a request, deducts any necessary tokens, and returns whether the request is allowed.

Applying this implementation in our application would be something like below:

const bucket = new TokenBucket(1, 10); // 1 token per second, max 10 tokens
function handleRequest(userRequest) {
  if (bucket.allowRequest()) {
    // Request allowed
    userRequest();
  } else {
    console.log("Too many requests, please try again later");
  }
}
function getPosts() {
  fetch('/path/to/api')
}
handleRequest(getPosts);

In this usage example, we initialize a new TokenBucket instance with a rate of one token per second and a capacity of 10 tokens. We then create a handleRequest() function that checks if a request is allowed and prints the appropriate message. We also test our request handler with a hypothetical getPosts() function.

This example, while written in JavaScript, should be able to help you get started with implementing rate limiting, or using the token bucket algorithm, in any language. For another practical implementation with Node.js, you can check out this article.

Almost all languages and frameworks also have libraries with which you can easily implement rate limiting without reinventing the wheel; some popular ones in the JavaScript ecosystem include the express-rate-limit package for Express.js and @nestjs/throttler for NestJS applications.

Rate limiting alternatives

API traffic management is not limited to rate limiting; there are other alternatives to controlling your application’s usage and managing traffic surges. Let’s quickly explore them below.

Throttling

Throttling is another technique for controlling the rate at which users can make requests to an API. Unlike rate limiting, which blocks requests once a limit is exceeded, throttling slows down the request rate by introducing delays:

A flowchart shows the process for throttling in API traffic management.

With this design nature, throttling can smooth out traffic spikes while users only experience fewer denials and delayed requests. However, one downside is that these deliberate delays can also increase latency and make the system slower, as each request waits in a queue for processing. Additionally, implementing throttling logic can be more complex compared to simple rate limiting, and in extreme cases, throttling alone may not protect the system from overload.

How to implement throttling

Throttling can be implemented by keeping a queue of request timestamps, counting the number of requests during a given time period, and introducing delays if the request rate exceeds the permitted limit. An example is shown below:

class Throttler {
    constructor(maxRequests, period) {
        this.maxRequests = maxRequests;
        this.period = period;
        this.requestTimes = [];
    }
    addTokens() {
        // Filter out old request timestamps
    }
    allowRequest() {
        // Check if current requests are below maxRequests
        // If yes, log the current timestamp and allow the request
        // If no, deny the request
    }
    delayRequest() {
        // Calculate delay needed until the next request can be allowed
    }
}

In this pseudo-code example, the Throttler class manages the request rate by keeping a queue of request timestamps. Then, an addTokens() method removes request timestamps that are older than the set period. Furthermore, an allowRequest() method determines whether the amount of requests within the period is less than the maximum allowed; if so, it logs the current timestamp and permits the request. Otherwise, it denies the request. Finally, a delayRequest method estimates the time until the next request can be allowed. You can also see the complete JavaScript implementation for this example here.

Spike control

Spike control is another popular technique for managing sudden surges in traffic that can overwhelm an API or service. It works by monitoring the request rate over short intervals and implementing measures such as temporarily blocking requests, redirecting traffic, or scaling resources to accommodate the increased load.

For example, imagine a scenario where your API can normally handle 100 requests per minute. With spike control, you set a threshold to detect if the number of requests suddenly jumps to 150 per minute:

A flowchart shows how the spike control management process works.

As demonstrated above, when such a spike is detected, you can configure your system to respond by temporarily blocking new requests to prevent overload, redirect traffic to additional servers to balance the load, or quickly scale up resources to manage the increased demand.

Circuit breaking

Circuit breaking is also an effective technique for managing the resilience of an API or service, especially in the face of failures or performance degradation. It works by monitoring the health of service interactions and temporarily halting requests to a failing service to prevent cascading failures. The usage of the word “service” here should have also hinted that circuit breaking is more popular and useful in microservices architecture than in monolithic or simple API systems, unlike the previous techniques we’ve covered.

Imagine a scenario where your service interacts with a third-party API. If the third-party API starts failing or responding slowly, your system can use a circuit breaker to detect this issue and stop making further requests to the failing service for a set period.

When the circuit breaker detects multiple consecutive failures or timeouts, it “trips” the circuit, temporarily blocking new requests to the problematic service. During this time, the system can return a fallback response or an error message to the user. Then, after a specified timeout period, the circuit breaker allows limited test requests to check if the service has recovered. If the service responds successfully, the circuit is closed, and normal operations resume. If failures continue, the circuit remains open, and requests are blocked again.

Deciding which API traffic management technique to use

Deciding which technique to use mostly depends on your application niche and requirements. However, considering all we’ve covered so far, rate limiting is more ideal for applications that need to enforce strict request quotas, such as public APIs or APIs with tiered access levels. Throttling is more suited for applications where maintaining performance and user experience is critical, such as e-commerce sites or social media platforms, as it introduces delays rather than outright blocking requests, thereby smoothing out traffic spikes:

A series of API traffic management strategies are shown on the left side, their purpose and benefits are placed in the center, and their potential downsides are written on the left.

Spike control would be key for applications that experience unpredictable surges in traffic, for example, ticketing websites during high-demand events or news sites during breaking news. Circuit breaking is particularly useful for applications that depend on multiple external services, like microservices architectures or SaaS platforms, as it prevents cascading failures by stopping requests to a failing service while simultaneously allowing the system to remain responsive.

It’s also possible to combine multiple strategies for even more effective traffic management. In some cases, you can further apply load balancing to distribute traffic across servers.

Comparing rate limiting, throttling, spike control, and circuit breaking

The table below highlights the major differences between the different API traffic management techniques we covered to help you quickly decide which might be best for you.

StrategyDescriptionBest forProtection againstExample applicationSuitable architecture
Rate limitingLimits the number of requests a user can make in a given time period.Enforcing quotas, preventing abuseAbuse, overusePublic APIs, SaaS applicationsMonolithic, microservices
ThrottlingSlows down the request rate by introducing delays.Smoothing traffic spikes, maintaining performancePerformance degradationEcommerce, social MediaMonolithic, microservices
Spike controlManages sudden traffic surges by temporarily blocking or redirecting requests.Managing traffic surges, ensuring stabilitySystem overloadTicketing systems, news websitesMicroservices, serverless
Circuit breakingTemporarily halts requests to a failing service to prevent cascading failures.Preventing cascading failures, maintaining responsivenessService failuresPayment gateways, SaaS platformsMicroservices, distributed

Conclusion

In this tutorial, we’ve explored rate limiting and other API traffic management techniques such as throttling, spike control, and circuit breaking. We covered how they work, their basic implementation, as well as their ideal application areas. It’s important to take certain traffic management measures to ensure your API can serve its users as intended, and this article provides a quick guide to quickly help you decide how and when to use which technique.

Hey there, want to help make our blog better?

Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.

Sign up now

Source: blog.logrocket.com

Related stories
1 month ago - Instead of an omnibus mega-post, this investigation into how JavaScript-first frontend culture broke US public services will be released in four parts. To catch them as they come out, subscribe to the RSS feed. When you live in the shadow...
4 days ago - What is V2 Cloud? V2 Cloud is a cost-effective desktop-as-a-service (DaaS) solution. It simplifies the deployment of cloud-hosted virtual machines (Windows-powered) to provide infrastructure with remote accessibility for small to...
2 weeks ago - Kashoo is the best cloud accounting software for small businesses. FreeAgent is best for freelancers as it includes invoicing and time tracking features, while FreshBooks is the best for service based businesses. The post Best Cloud...
6 days ago - Implementing rate limiting in web applications is a necessary web development best practice. In an article published earlier, I delved deep into the benefits and real life use cases of API rate limiting. Some of the benefits include its...
1 week ago - Back-end servers are the powerhouse of modern-day applications; hence, a high level of expertise goes into building them. However, it's important to ensure that these back-end servers are well-secured from bad actors (hackers, phishers)....
Other stories
1 hour ago - Hello, everyone! It’s been an interesting week full of AWS news as usual, but also full of vibrant faces filling up the rooms in a variety of events happening this month. Let’s start by covering some of the releases that have caught my...
2 hours ago - Nitro.js is a solution in the server-side JavaScript landscape that offers features like universal deployment, auto-imports, and file-based routing. The post Nitro.js: Revolutionizing server-side JavaScript appeared first on LogRocket Blog.
2 hours ago - Information architecture isn’t just organizing content. It's about reducing clicks, creating intuitive pathways, and never making your users search for what they need. The post Information architecture: A guide for UX designers appeared...
2 hours ago - Enablement refers to the process of providing others with the means to do something that they otherwise weren’t able to do. The post The importance of enablement for business success appeared first on LogRocket Blog.
3 hours ago - Learn how to detect when a Bluetooth RFCOMM serial port is available with Web Serial.