Graceful shutdown of Kubernetes Pod without dropping Traffic
This blog will talk about how to avoid connection drops(5xx) during the rolling upgrades in Kubernetes. Let us get into details
What happens when a Pod gets deleted?
When you type
kubectl delete pod
the pod is deleted, and the endpoint controller removes its IP address and port (endpoint) from the Services and etcd You can observe this with
kubectl describe service
But that’s not enough! Several components sync a local list of endpoints
kube-proxy keeps a local list of endpoints to write iptables rules
CoreDNS uses the endpoint to reconfigure the DNS entries And the same is true for the Ingress controller, Istio, etc
All those components will (eventually) remove the previous endpoint so that no traffic can ever reach it again. At the same time, the kubelet is also notified of the change and deletes the pod.
What happens when the kubelet deletes the pod before the rest of the components?
Unfortunately, you will experience downtime because components such as kube-proxy, CoreDNS, the ingress controller, etc., still use that IP address to route traffic So what can you do?
wait !
If you wait long enough before deleting the pod, the in-flight traffic can still resolve, and the new traffic can be assigned to other pods.
How are you supposed to wait?
When the kubelet deletes a pod, it goes through the following steps: — Triggers the `preStop` hook (if any) — Sends the SIGTERM — Sends the SIGKILL signal (after 30 seconds)
You can use the `preStop` hook to insert an artificial delay
You can listen to the SIGTERM signal in your app and wait Also, you can gracefully stop the process and exit when you are done waiting Kubernetes gives you 30s to do so (configurable).
At this point, Kubernetes waits for a specified time called the termination grace period. By default, this is 30 seconds. It’s important to note that this happens in parallel to the preStop hook and the SIGTERM signal. Kubernetes does not wait for the preStop hook to finish.
If your app finishes shutting down and exits before the terminationGracePeriod is done, Kubernetes moves to the next step immediately.
If your pod usually takes longer than 30 seconds to shut down, make sure you increase the grace period. You can do that by setting the terminationGracePeriodSeconds option in the Pod YAML.
Should you wait 10 seconds, 20, or 30s? There’s no single answer While propagating endpoints could only take a few seconds, Kubernetes doesn’t guarantee any timing nor that all of the components will complete it at the same time