Auto Scaling

  • Link11-Team
  • April 24, 2025

Content

Auto Scaling

Auto Scaling is the automatic increase or decrease of computational resources that are available for assignment to workloads.

Auto Scaling is closely associated with load balancing. Strictly speaking, a load balancer does not require auto scaling capabilities. However, load balancers which include auto scaling are generally much more effective. Also, it is ideally suited for load-balanced workloads which run in the cloud. Today, the large cloud providers integrate auto scaling into their cloud load balancing capabilities.

Auto Scaling is a straightforward concept. Backend servers are brought online or offline automatically, depending on the computational workloads that they must handle. Load balancing then distributes the workloads across the pool of servers. (Note that in this context, “server” can be, but is not necessarily, a physical machine. It could also be a virtual machine, a cloud instance, etc.)

Auto Scaling has two primary tasks:

  1. Ensuring Quick and Efficient Workload Processing: The first goal of auto scaling is to have enough servers available to process workloads quickly and efficiently, even during peak traffic periods or in the face of security threats.
  2. Minimizing Operational Expenses: While efficiently handling workloads is essential, avoiding excessive available capacity is equally crucial to optimize operational costs. Auto Scaling dynamically adjusts the number of servers, preventing unnecessary expenses on unused resources.

Generally, the first task is prioritized over the second. For example, when using reactive auto scaling (explained below), it is common for organizations to use a configuration that scales up aggressively when workloads increase, but scales down more slowly after workloads decrease.

A perfect auto scaler would be able to accomplish both tasks. At any given time, there would be just enough available server capacity to handle the current workload effectively. Although this perfection is impossible in the real world, much progress has been made toward this ideal, as reflected in the policy options below.

Types of Auto Scaling Policies

When configuring an auto scaling system, there is usually a setting (often called its policy) that defines how scaling occurs. There are three primary types of scaling policies: scheduled, reactive, and predictive.

Scheduled Scaling

Scheduled scaling involves automatically bringing servers up or down according to a preset schedule. For instance, an organization might have higher workload requirements during business hours. To save on electricity and other operational expenses during off-peak times, a certain number of servers can be scheduled to go offline each night.

Reactive Scaling

Reactive scaling means that servers are brought up and down in reaction to changes in workloads. As workloads increase, the system responds by bringing more servers online. Subsequently, when workload requirements decline, servers are taken offline again. This is much more effective than scheduled auto scaling. (In other words, available server capacity is much more likely to closely track current workload requirements.) However, it is also much more complicated.

The system must be able to closely and correctly assess incoming workloads, while also gauging available extra capacity within the servers that are currently online. This is closely tied into the operation of the underlying load balancer. Indeed, many of the same considerations apply; for example, there are a variety of possible metrics to use for assessing capacity (e.g., current bandwidth usage, number of connections, CPU usage, memory usage, etc.). Each metric has its advantages and disadvantages. (For more information on this, see How a Load Balancer Works).

The reactive method can be very effective, but it has one potential flaw. It waits until workloads increase before it scales up computational resources. This means there is a slight delay: a short period in which workloads are higher, but more capacity is not yet available. When workloads leap up quickly, this can cause problems, and clients can experience degraded performance.

Predictive scaling

Predictive scaling addresses one potential flaw of reactive scaling – the slight delay in scaling up computational resources when workloads increase rapidly. To counter this, predictive scaling uses historical data analysis to identify resource usage patterns and predicts when workloads will increase. The system proactively expands server capacity just before the expected increase occurs, ensuring smooth handling of traffic spikes.

Predictive scaling is the most sophisticated approach among the three, but it is also the most challenging to implement correctly. For example, cloud providers such as AWS have introduced predictive scaling to enhance their services’ auto scaling capabilities.

Comprehensive cyber security

Contact our experts and find out how your business can be protected with an automated security solution.

The Future of Auto Scaling

Auto Scaling technology continues to evolve, driven by ongoing efforts to improve its capabilities and performance. A promising direction for the future is the integration of multiple scaling policies simultaneously. For example, a web application could leverage both reactive and predictive scaling to dynamically respond to immediate changes in traffic while also benefiting from historical data analysis to anticipate future demands accurately.

Researchers and providers are actively working on developing best practices for combining scaling policies effectively. By optimizing the synergy between these policies, organizations can achieve superior results tailored to their specific applications. Auto Scaling is expected to remain a critical component of web security, playing a vital role in ensuring web applications can handle varying traffic loads efficiently and securely.

Today’s auto scaler are powerful and sophisticated, but providers are continuing their efforts to make them even better.

Summary

Auto Scaling allows web applications to dynamically adjust computational resources based on traffic demands. It closely integrates with load balancing to ensure efficient workload distribution across servers. Different scaling policies, such as scheduled, reactive, and predictive scaling, offer distinct approaches to adjusting server capacity. The future of auto scaling looks promising as it continues to evolve, driven by ongoing research and the quest for optimal performance and security in web applications.

Lastly, auto scaling also plays an increasingly important role in web security. For example, a solution that mitigates DDoS attacks should be able to auto scale in order to absorb the high volumes of incoming traffic. And more intensive workloads such as behavioral analysis can also benefit from the resource optimization that auto scaling provides.

Preventing ATO (Account Takeover) Attacks, Part 2: Multi-Factor Authentication
Link11 Discovers Record Number of DDoS Attacks in First Half of 2021
X