Rate limiting is a defensive strategy for spreading network traffic. It involves setting an upper limit on how often users or applications can perform an action within a defined timeframe. Rate limiting therefore limits the number of requests that can reach web servers. For this reason, the technology is used to protect websites or applications from unintentional or malicious overloading, for example by bots.
APIs are frequent points of attack as they are often publicly accessible and used by a large number of clients and applications. Faulty implementations or malicious attacks can expose APIs to an uncontrolled request load.
One consequence of this can be the impairment of users, as individual clients or applications monopolize the service. Rate limiting therefore plays an important role in the protection of APIs.
The way rate limiting works is based on tracking requests from a specific source within a defined period of time. The first step is to identify this source, be it an IP address, a user ID, or an API key.
The number of requests originating from a source is then counted in the defined time window. If the number of requests exceeds the configured limits, an action is executed that rejects or delays the request.
Various algorithms can be used to implement rate limiting. The token bucket algorithm uses a virtual bucket that is filled with a certain number of tokens. One token is consumed for each request that a source makes to the target server. The bucket is refilled after a certain period of time, but if it is empty before the time expires, further requests are stopped.
The leaky bucket algorithm is comparable to this. A virtual bucket is also used here, except that it collects requests instead of tokens. As the bucket runs out at regular intervals, it should not be possible to completely fill it with normal requests. If this happens anyway, further requests will not get through.
With the fixed window counter algorithm, fixed time windows are set and counters record the number of requests within them. If the number of requests exceeds the defined limit, further attempts are rejected until the time windows are over and the counters are reset.
With the sliding window log algorithm, requests are logged with a time stamp and counted within a continuously sliding time window. If the number of time stamps reaches the predefined number, further requests are stopped.
A combination of the fixed-window and sliding-window log algorithms is the sliding-window counter algorithm. Here, the requests are counted in fixed time windows and in a time window that moves forward.
Rate limiting is also used as a security measure for user logins. It serves not only to protect the infrastructure but also end users. In particular, it is intended to counteract brute force attacks, in which attackers systematically try out different password combinations in order to gain unauthorized access to user accounts.
Rate limiting can also be used to prevent credential stuffing attacks, in which criminals try out combinations of stolen usernames and passwords to gain access to accounts. By limiting the number of login attempts within a defined time window, such attacks are made more difficult and system resources are protected from misuse at the same time.
The terms are often used interchangeably, but throttling works differently. With throttling, requests are not rejected directly, but delayed in order to reduce the load on the servers.
Rate limiting is effective against certain types of attacks, but it is not a panacea. It should be used in combination with other security measures, such as a web application firewall (WAF), to ensure comprehensive protection.
Like a bouncer restricting entry to a certain number of people per hour, rate limiting restricts requests to a certain number per set time window. Therefore, rate limiting is an important tool for anyone running a website, API, or other online service. It provides important protection against abuse and overload, keeping services available and reliable for the end user.