Autoscaling in Kubernetes
What is Auto Scaling in Kubernetes?
Kubernetes AutoScaling lets you automate the process of scaling up and scaling down the nodes and pods based on the demand.
Autoscaling helps us optimize the cost and resources of the cloud in line with demand.
The Kubernetes autoscaling mechanism uses two layers:
- Pod-based scaling — supported by the Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA).
- Node-based scaling — supported by the Cluster Autoscaler.
Kubernetes provides auto-scaling of the pods by Horizontal and vertical.HPA automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand.
HPA makes use of metrics to determine auto-scaling, as follows:
- For resource metrics, you can either set a target utilization value or a fixed target.
- For custom metrics, only raw values are supported, and you cannot define a target utilization.
- For object metrics and external metrics, scaling is based on a single metric obtained from the object, which is compared to the target value to produce a utilization ratio.
- Horizontal pod autoscaling does not apply to objects that can’t be scaled (for example a DaemonSet.).
- Do not use HPA and VPA together for CPU and memory.
HPA works based on metrics. We can collect the resource utilization metrics like CPU and memory by deploying a metrics server. The horizontal pod autoscaling controller, running within the Kubernetes control plane, periodically adjusts the desired scale of its target (for example, a Deployment) to match observed metrics such as average CPU utilization, average memory utilization, or any other custom metric you specify. HorizontalPodAutoscaler controls the scale of a Deployment and its ReplicaSet.
Any HPA target can be scaled based on the resource usage of the pods in the scaling target. When defining the pod specification the resource requests like
memory should be specified. This is used to determine the resource utilization and used by the HPA controller to scale the target up or down. To use resource utilization based scaling specify a metric source like this:
One or more scaling policies can be specified in the
behavior section of the spec. When multiple policies have been specified the policy which allows the highest amount of change is the policy that is selected by default. The following example shows this behavior while scaling down:
- type: Pods
- type: Percent
The stabilization window is used to restrict the flapping of replicas count when the metrics used for scaling keep fluctuating. The autoscaling algorithm uses this window to infer a previous desired state and avoid unwanted changes to the workload scale.
For example, in the following example snippet, a stabilization window is specified for
Vertical pod autoscaler helps to size the pods with the right set of CPU and memory. Instead of adjusting CPU, Memory limits manually in Kubernetes deployment, VPA provides a way to update the CPU requests and limits, memory requests, and limits dynamically.
Cluster auto scaler automatically adjusts the size of the Kubernetes cluster and so we don’t end up with any pods are pending state and at the same time, we don’t overprovision the cluster. Cluster autoscaler works on a per-node pool basis. When you configure a node pool with cluster autoscaler, you specify a minimum and maximum size for the node pool.
You can refer to the GitHub Community repo for implementation. Different cloud has a different set of configurations that need to be done.