Sorry, you need to enable JavaScript to visit this website.

Kubernetes Autoscaling Enigma: How To Get It Right?

Kubernetes Autoscaling Enigma: How To Get It Right?
March 05, 2021

Applications must benefit from the elasticity offered by the cloud to cater to the varying demand loads without impacting the performance. However, autoscaling looks like an enigma on the cloud. Why so? Clusters, containers, container orchestration, kubernetes, pods, and nodes have been around for five years or more. Going by IT norms, five years is a good enough period for new tech to be thoroughly understood. Yet, kubernetes autoscaling is often adopted most readily without understanding the implications of all its forms.

Understand the implications of autoscaling and all its forms before adopting it quickly for the Cloud.

For the benefit of those who see adopting scalability forms such as kubernetes autoscaling, and even container orchestration in general, as a challenge, here is a run through of which form is the best one for which application.

Teams working on a cloud-native application are not generally bothered about scalability as it is meant to be on cloud. But the cloud does not handle the traffic spikes of your application automatically i.e., without doing anything. You cannot take the elasticity for granted and available out-of-the-box by virtue of only being on cloud. Infrastructure as a Service (IaaS) allows you to set the scalability options for your application by offering various forms applicable to node, pod, and container orchestration levels. However, a determination of the scalability form based on the application must be made by the application designer or the application architect. The application architecture and design must be agile in response to the changing demands, within the system constraints. A lot also depends on the scalability.

What is Autoscaling?

It is the automatic resizing of the central processing unit (CPU) and memory resources. This could be at cluster and node level or at the pod level. The key is resizing. That is not just about increasing, but also decreasing the resources depending upon the wax and wane of demands of the workload.

Cluster- and Node-Level Autoscaling

Cluster and node-level autoscaling allows automatic addition or decrease in the number of nodes in a cluster. It is managed by a cluster autoscaler. While configuring, you can set up the limits and autoscaler will automatically increase or decrease the nodes within these limits. A key point here is that the change in the number is based on requests and not on the actual utilization. The checks are made not just to detect the inadequacy to meet the resource requests. It also checks whether they are running tainted nodes that match the toleration. It also does a continuous scan of whether a node has a pod that could be rescheduled on another node. In that case, it triggers the eviction of the pod subsequently removing an empty node. This form works well for workloads that are designed for horizontal scaling.

Pod-Level Horizontal Autoscaling

Pod-level horizontal autoscaling enables automatic addition or removal of pods for the same workload based on the demand in the form of pod replicas. It’s managed through a horizontal pod autoscaler. To do so, it looks at the CPU, or memory utilization, or even the network traffic, as the metric based on which it decides to increase or decrease the number of pods. You can specify the target utilization in percentage or as direct values. The autoscaler monitoring the workload considers these to decide whether to add or remove the number of pods, within the specified limits, to be closer to the target utilization. It remarkably uses custom metrics. These metrics must be available in the monitoring system and exported from the application to the monitoring system. Examples– the number of outstanding email messages, and the number of outstanding orders waiting to be processed. This form again works well for workloads that are designed for horizontal scaling.

Pod-Level Vertical Autoscaling

Pod-level vertical autoscaling allows an increase or decrease in the amount of CPU and memory resources available to a pod based on the demand. It is managed through a vertical pod autoscaler. As with other forms, the autoscaling happens within the specified limits. However, there are two remarkable things in this form. One, this form can be used to recommend how many resources the pod deployment needs. Second, both the recommendation and the autoscaling are based on the analysis of the CPU and memory needs of the containers and analysis of live requests, respectively. Therefore, there is no guesswork or benchmarking involved. This form, thus, ensures that the resource allocation is better matched with the actual usage. You can also opt out specific containers from autoscaling. The automatic scaling mode involves adjusting a pod’s CPU and memory requests. Since the running pod cannot be adjusted, it will be terminated, and a new pod is created. This form works well for workloads that cannot be spread across multiple instances.


Deciding upon the form of autoscaling requires a lot more attention than one could assume. One cannot leave things to the IaaS providers just because of being on cloud or dealing with a ’cloud native’ application.

Below are the conclusions:

  • There are various forms of autoscaling available from IaaS to choose from. It needs a due diligence in deciding upon which form to apply.
  • IaaS also allows you to configure the scalability based on the workload demands of resources and the metrics of choice. You can send metrics on any parameter to your monitoring framework and bring that in as a trigger for autoscaling. This enables a finer granularity in defining the autoscaling triggers rather than just being limited to resource utilization.