Autoscaling in Kubernetes

Kubernetes AutoScaling lets you automate the process of scaling up and scaling down the nodes and pods based on the demand.

Autoscaling helps us optimize the cost and resources of the cloud in line with demand.

The Kubernetes autoscaling mechanism uses two layers:

  • Pod-based scaling — supported by the Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA).
  • Node-based scaling — supported by the Cluster Autoscaler.

Pod-based Scaling:

Kubernetes provides auto-scaling of the pods by Horizontal and vertical.HPA automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand.


HPA makes use of metrics to determine auto-scaling, as follows:

  • For resource metrics, you can either set a target utilization value or a fixed target.
  • For custom metrics, only raw values are supported, and you cannot define a target utilization.
  • For object metrics and external metrics, scaling is based on a single metric obtained from the object, which is compared to the target value to produce a utilization ratio.

Limitations :

  • Horizontal pod autoscaling does not apply to objects that can’t be scaled (for example a DaemonSet.).
  • Do not use HPA and VPA together for CPU and memory.


HPA works based on metrics. We can collect the resource utilization metrics like CPU and memory by deploying a metrics server. The horizontal pod autoscaling controller, running within the Kubernetes control plane, periodically adjusts the desired scale of its target (for example, a Deployment) to match observed metrics such as average CPU utilization, average memory utilization, or any other custom metric you specify. HorizontalPodAutoscaler controls the scale of a Deployment and its ReplicaSet.

Resource metrics:

Any HPA target can be scaled based on the resource usage of the pods in the scaling target. When defining the pod specification the resource requests like cpu and memory should be specified. This is used to determine the resource utilization and used by the HPA controller to scale the target up or down. To use resource utilization based scaling specify a metric source like this:

type: Resource
name: cpu
type: Utilization
averageUtilization: 60

Scaling policies:

One or more scaling policies can be specified in the behavior section of the spec. When multiple policies have been specified the policy which allows the highest amount of change is the policy that is selected by default. The following example shows this behavior while scaling down:

- type: Pods
value: 4
periodSeconds: 60
- type: Percent
value: 10
periodSeconds: 60

Stabilization window:

The stabilization window is used to restrict the flapping of replicas count when the metrics used for scaling keep fluctuating. The autoscaling algorithm uses this window to infer a previous desired state and avoid unwanted changes to the workload scale.

For example, in the following example snippet, a stabilization window is specified for scaleDown.

stabilizationWindowSeconds: 300


Vertical pod autoscaler helps to size the pods with the right set of CPU and memory. Instead of adjusting CPU, Memory limits manually in Kubernetes deployment, VPA provides a way to update the CPU requests and limits, memory requests, and limits dynamically.

Node-based Scaling:

Cluster auto scaler automatically adjusts the size of the Kubernetes cluster and so we don’t end up with any pods are pending state and at the same time, we don’t overprovision the cluster. Cluster autoscaler works on a per-node pool basis. When you configure a node pool with cluster autoscaler, you specify a minimum and maximum size for the node pool.

You can refer to the GitHub Community repo for implementation. Different cloud has a different set of configurations that need to be done.




Devops Advocate

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium


Never let obstacles get in your way! | How I transformed Through NxtWave CCBP 4.0

IceDrive Review: Pros, Cons & Alternative

How to Integrate Sign-in with Apple into Your Django Project

pUSD/PERI Pair has Opened on PancakeSwap and QuickSwap!

Setting up python environment in macOS using Pyenv and Pipenv

What is actually a REST API ?

Using QA Metrics throughout the QA Life cycle

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Reddeppa S

Reddeppa S

Devops Advocate

More from Medium

Pods Creation Process Flow

Kubernetes Service for Absolute Beginners — Loadbalancer

Understanding the Kubernetes manifest

Kubernetes Resource Confirmation