Upgrade AKS Kubernetes cluster with zero downtime

Reddeppa S
2 min readSep 15, 2021

--

Upgrading Kubernetes Managed clusters is a bit different from the self-managed clusters.

When it comes to AKS Kubernetes clusters upgrades can be done in multiple ways to achieve minimal impact in production environments.

More importantly, the Kubernetes upgrade should be done sequentially. You can not skip the minor versions.

Suppose your cluster running on 1.18.x and you wanted to upgrade it to 1.21.x.

The upgrade path will be as below

1.18.x → 1.19.x , 1.19.x → 1.20.x and finally 1.20.x → 1.21.x

Skipping multiple versions can only be done when upgrading from an unsupported version back to a supported version. For example, an upgrade from an unsupported 1.11.x → to a supported 1.18.x can be completed.

Upgrade control plane and nodes together :

az aks upgrade \
--resource-group <resource group name> \
--name <aks cluster name> \
--kubernetes-version <Kubernetes version>

AKS upgrade command will do the below steps:

  1. upgrades control plane to the desired version
  2. Adds the buffer nodes based on node surge set on cluster configuration
  3. cordon and drain one of the old nodes
  4. once the old node is fully drained, it will be reimaged and it becomes a buffer node.
  5. the process continues until all the nodes are upgraded
  6. Finally, buffer nodes will be deleted.

This is a simple and straightforward approach, however using this method you will have more impact on services as azure will not wait for all pods to be recreated on the new node.

Upgrade control plane alone and create a new node pool:

  1. Upgrade control plane to desired Kubernetes version
az aks upgrade \
--control-plane-only \
--resource-group <resource group name> \
--name <aks cluster name> \
--kubernetes-version <Kubernetes version>

2. Create a new node pool

az aks nodepool add \
--cluster-name <aks cluster name>\
--resource-group <resource group name> \
--name node \
--mode "System" \
--node-vm-size <Node SKU>\
--node-count <node Count> \
--max-pods <max pods>

3. Cordon all nodes in the old node pool

kubectl cordon <node>

4. Drain Nodes

Drain one node at a time and wait for all the pods to be recreated in the new node. Make sure all the pods are healthy before draining the next node.

kubectl drain --delete-emptydir-data --ignore-daemonsets <node>

5. delete old node pool once all the nodes are cordoned and drained

az aks nodepool delete --cluster-name <cluster name> --name <Old Pool name> -- resource-group < resource group name>

With this approach, the Administrator will have more control over the draining of the nodes. There will be minimal impact on the application

--

--

Responses (1)