Understanding Horizontal Pod Autoscaler using a demo on local k8s cluster

“Resilience is our ability to bounce back from life’s challenges and to thrive, grow and expand.”
HorizontalPodAutoscaler(HPA) is a sperate resource type in kubernetes, which scales the number of pods, based on CPU, Memory utilization or some custom metrics. HPA helps to optimise the number of replicas that need to be maintained in an environment for your applications, which helps in distributing load. Behind the scenes, autoscaler controller updates the replicas of k8s resource, like a deployment, replicaset or statefulset.
The value it brings to table is a more resilient application, that can take care of itself at times of increase in demand for applications. But there should be enough pysical resources for the pods to expand. In this article I will try to eplain HPA aalong with a demo.
HPA controller peridically checks metrics. When the average cpu and memory goes too high, it tell k8s to increase the replica count of the target deployment. So it needs to know how to get the metrics from the cluster in the first place. We will have to connect the controller to a metrics collector. Values for CPU, memory etc are obtained from the average of your pods. The most important thing is to have limits defined in your deployments. Then provide the minimum and maximum pod count to the HPA configuration. HPA will also scale down based on a cooldown period. It uses below formula to calculate the number of replicas to maintain.
Algorithm to calculate Replicas
desired_replica = ceil(current_replica * (current value/ target value))
To understand this lets consider two different scenarios where in a decision need to be taken.
Scenario 1
Assume that an application has some business requirement and based on the physical limits we come up with some ideal CPU/memory numbers. We wish to maintain the target CPU below 60% for the worker nodes. Then there was a spike in traffic and the current utilization reached 90%. The deployment object had defined 3 replicas and the pods are taking heavy load. Lets use the algorithm to find the desired pod count.
Target CPU utilization : 60%
Current Utilization: 90%
Current pods: 3
Desired Pods = ceil(current_pods * (current value/ target value))
Desired Pods = ceil(3*(.9/.6)) = 5
In this scenario, HPA controller will change the replicas in the deployment to 5, and scheduler will update the need by adding 2 more pods.
Scenario 2
Lets assume that after some period the traffic has come down, and now the current utilization has come down to 20%. We wish the additional replicas be reduced automatically. Lets see how the HPA controller grants our wish.
Target CPU Utilization : 60%
Current Utilization: 20%
Current pods : 5
Desired Pods = ceil(5*(.3/.6)) = 2
Here it determines only 2 replicas is sufficient and tells deployment controller to reduce it to this number. But deployment object had a minimum replica count of 3. So kubernetes will honor that and reduce the replica count from 5 to 3.
Lab Setup
I am going to use a “kind” cluster on my local laptop. KIND is kubernetes in docker, which is good for demo and testing kubernetes applications. You get the flexibility to add worker nodes and manage multiple clusters using KIND. For more details checkout my article on Kind https://faun.pub/from-minikube-to-kind-c5b3a5cb95
I am using KINDs extraPortMapping feature for creating a cluster to forward ports from host to ingress controller. We will be using nginx ingress controller.
kind_cluster.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: hpacluster
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 80
hostPort: 80
protocol: TCP
- containerPort: 443
hostPort: 443
protocol: TCP
- role: worker
- role: worker
- role: worker
- role: worker
Create the k8s cluster
asishs-MacBook-Air:kind$ kind create cluster --config hpa-lab.yaml
Creating cluster "hpacluster" ...
✓ Ensuring node image (kindest/node:v1.21.1) 🖼
✓ Preparing nodes 📦 📦 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-hpacluster"
You can now use your cluster with:kubectl cluster-info --context kind-hpaclusterHave a nice day! 👋
Check the cluster.
Lets create our deploy object. I am using a nginx image for this.
frontend.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
labels:
app: frontend
name: frontend
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: frontend
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: frontend
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
ports:
- containerPort: 80
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
frontend-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: frontend
name: frontend-svc
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: frontend
Install the nginx controller specific patches for KIND cluster. The manifests contains KIND specific patches to forward the hostPorts to the ingress controller.
kubectl -n ingress-nginx apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/kind/deploy.yaml
Wait till the ingress controller is ready to process
kubectl wait --namespace ingress-nginx \
--for=condition=ready pod \
--selector=app.kubernetes.io/component=controller \
--timeout=90s
Create the ingress manifest for frontend service
frontend-ingress.yaml
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: frontend-ingress
spec:
rules:
- http:
paths:
- path: /
backend:
serviceName: frontend-svc
servicePort: 80
After applying the above objects, we should be able to reach the nginx service on localhost.
asishs-MacBook-Air:hpa$ kubectl get pods
NAME READY STATUS RESTARTS AGE
frontend-86968456b9-p7nc2 1/1 Running 0 60m
asishs-MacBook-Air:hpa$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
frontend-svc ClusterIP 10.96.161.97 <none> 80/TCP 60m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 61m
asishs-MacBook-Air:hpa$ kubectl get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
frontend-ingress <none> * localhost 80 57masishs-MacBook-Air:hpa$ curl -I http://localhost
HTTP/1.1 200 OK
Date: Sun, 20 Jun 2021 16:28:19 GMT
Content-Type: text/html
Content-Length: 612
Connection: keep-alive
Last-Modified: Tue, 25 May 2021 12:28:56 GMT
ETag: "60aced88-264"
Accept-Ranges: bytes
Lets add our HPA manifest and see what happens when we directly add it.
hpa.yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: frontend-hpa
namespace: default
spec:
minReplicas: 3
maxReplicas: 10
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: frontend
targetCPUUtilizationPercentage: 10
We are telling HPA to keep the target CPU utilization to 10%. Create the HPA object.
asishs-MacBook-Air:hpa$ kubectl apply -f hpa.yaml
horizontalpodautoscaler.autoscaling/frontend-hpa created
asishs-MacBook-Air:hpa$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend <unknown>/10% 3 10 1 29s
Notice that the target percentage is shown as “unknown”. Which means our HPA controller is not able to get metrics of resources from this deployment. We can use a basic metric server to capture the metrics. If we need to define more advanced metrics, we can consider monitoring solutions like Prometheus.
Lets install Metrics server to our cluster.
asishs-MacBook-Air:hpa$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
Check we can gather metrics foerm the cluster
asishs-MacBook-Air:hpa$ k top nodes
W0620 22:53:30.277142 49629 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
When I check the logs for the metrics-server pod, I see that there is some certificate related errors
E0620 17:29:41.525715 1 scraper.go:139] "Failed to scrape node" err="Get \"https://172.18.0.4:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 172.18.0.4 because it doesn't contain any IP SANs" node="hpacluster-worker3"
E0620 17:29:41.534082 1 scraper.go:139] "Failed to scrape node" err="Get \"https://172.18.0.6:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 172.18.0.6 because it doesn't contain any IP SANs" node="hpacluster-worker4"
Lets disable this warning. Below is the complete set of args for the metrics-server deploy manifest which I am using.
...
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
...
Now lets check metrics server again.
asishs-MacBook-Air:hpa$ kubectl top nodes
W0621 07:24:43.894564 52298 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
hpacluster-control-plane 184m 4% 573Mi 28%
hpacluster-worker 126m 3% 122Mi 6%
hpacluster-worker2 25m 0% 106Mi 5%
hpacluster-worker3 85m 2% 93Mi 4%
hpacluster-worker4 74m 1% 93Mi 4%
Metrics server looks good now. Lets check HPA
asishs-MacBook-Air:hpa$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend <unknown>/10% 3 10 3 85s
Okey, HPA is still showing unknown. The missing part is adding limits to deploy object. Lets add that and see.
deploy manifest for frontend
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
ports:
- containerPort: 80
protocol: TCP
resources:
limits:
cpu: 600m
memory: 128Mi
requests:
cpu: 200m
memory: 64Mi
HPA works only after we add limits to the deploy object. Now lets check HPA again.
asishs-MacBook-Air:hpa$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend <unknown>/10% 3 10 3 9m51sasishs-MacBook-Air:hpa$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend 0%/10% 3 10 3 10m
Please be patient, as the HPA controller will take some time to reflect. After sometime the target shows current utilization, in our case it is showing 0% current utilization and our target utilization is 10%, along with the min and max pods and replicas.
Let us hit our service with some traffic using apache benchmarking (ab) tool. Here I am starting a load on our frontend deploy object with 3 replica pods. When the traffic increases, we should see a spike in CPU and Memory utilisation for the pods which triggers the HPA controller to increase replicas. When the ab testing is finished, we should see the load reducing and correspondingly HPA controller reduces the pod count to the initial replica count.
Starting the traffic to the service:
asishs-MacBook-Air:kind$ ab -n 1000000 -c 100 http://localhost/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/Benchmarking localhost (be patient)Server Software:
Server Hostname: localhost
Server Port: 80Document Path: /
Document Length: 0 bytesConcurrency Level: 100
Time taken for tests: 224.176 seconds
Complete requests: 98073
Failed requests: 0
Total transferred: 0 bytes
HTML transferred: 0 bytes
Requests per second: 437.48 [#/sec] (mean)
Time per request: 228.581 [ms] (mean)
Time per request: 2.286 [ms] (mean, across all concurrent requests)
Transfer rate: 0.00 [Kbytes/sec] receivedConnection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 129.0 0 19662
Processing: 0 1 4.3 1 544
Waiting: 0 0 0.0 0 0
Total: 0 2 129.1 1 19663Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 1
95% 1
98% 2
99% 2
100% 19663 (longest request)
2. Check HPA resource usage
asishs-MacBook-Air:hpa$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend 0%/10% 3 10 3 138m
asishs-MacBook-Air:hpa$ k get hpa -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend 0%/10% 3 10 3 138m
frontend-hpa Deployment/frontend 3%/10% 3 10 3 138m
frontend-hpa Deployment/frontend 25%/10% 3 10 3
3. Check the POD resource usage using metrics server
asishs-MacBook-Air:hpa$ kubectl top pods
W0621 09:56:37.103195 53592 top_pod.go:140] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME CPU(cores) MEMORY(bytes)
frontend-78764b4d8-5k5ln 0m 1Mi
frontend-78764b4d8-fzmsd 0m 1Mi
frontend-78764b4d8-grdjr 0m 1Miasishs-MacBook-Air:hpa$ kubectl top pods
W0621 09:56:48.363569 53619 top_pod.go:140] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME CPU(cores) MEMORY(bytes)
frontend-78764b4d8-5k5ln 38m 2Mi
frontend-78764b4d8-fzmsd 35m 3Mi
frontend-78764b4d8-grdjr 73m 1Mi
There is an increase of CPU and memory usage. In the HPA manifest for this deploy object, we had specified targetCPUUtilizationPercentage: 10
asishs-MacBook-Air:hpa$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend 0%/10% 3 10 3 138m
asishs-MacBook-Air:hpa$ k get hpa -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend 0%/10% 3 10 3 138m
frontend-hpa Deployment/frontend 3%/10% 3 10 3 138m
frontend-hpa Deployment/frontend 25%/10% 3 10 3 138m
frontend-hpa Deployment/frontend 4%/10% 3 10 6 138m
frontend-hpa Deployment/frontend 0%/10% 3 10 8 138m
frontend-hpa Deployment/frontend 0%/10% 3 10 8 139m
You can see that as soon as the CPU utilization reached 10%, it scaled the replicas. It is evident from the events logs.
asishs-MacBook-Air:hpa$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE26m Normal ScalingReplicaSet deployment/frontend Scaled up replica set frontend-78764b4d8 to 6
26m Normal ScalingReplicaSet deployment/frontend Scaled up replica set frontend-78764b4d8 to 8
4. After a while when the ab test is over, the load on the pods gradually reduces. When it comes below 10%, after a time period, HPA controller will reduce the replica count to the normal valuus.
asishs-MacBook-Air:hpa$ k get hpa -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
139m
frontend-hpa Deployment/frontend 0%/10% 3 10 8 145m
frontend-hpa Deployment/frontend 4%/10% 3 10 6 138m
frontend-hpa Deployment/frontend 0%/10% 3 10 3
From events logs;
asishs-MacBook-Air:hpa$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
2m51s Normal ScalingReplicaSet deployment/frontend Scaled up replica set frontend-78764b4d8 to 6
2m36s Normal ScalingReplicaSet deployment/frontend Scaled up replica set frontend-78764b4d8 to 8
4m6s Normal ScalingReplicaSet deployment/frontend Scaled down replica set frontend-78764b4d8 to 3
One thing which is noticable is that HPA is quick enough to scale out to handle the extra load, but it gives some time to scale in.
HPA has below default timing
30 seconds as interval between metrics check
3 mins for scale out operation
5 mins for scale in operation
These values are configurable on the controller side.
HPA thrashing
If HPA monitored the deployment and made immediate changes so frequently, then this would lead to thrashing or instability of service by adding and removing pods quickly.
We need to find a balance, where cluster is responsive to a trend in metrics and not too immediate.
We want to scale out fairly quickly to handle spikes and scale in a bit slower.
This is accomplished by “cool down” periods, by adding delays between two scale out or scale in operations, by giving a chance for the cluster to stabilize, honoring other scaling operations.
Best Practices
There should be resource limits on the pods specified. Without limits, HPA wont work.
The minimum replica count should be calculated properly and mentioned.
If your application requires some other metrics other than CPU, you have to deep dive on it and use the same. May be integrate with solutions like prometheus.
You need to consider that your application will take its own sweet time to start up, consider liveness-probe for example. So auto scaling will not be immediate. It can take several minutes to scale out. Give some buffer for your application to handle sudden spikes.
If your cluster is not able to handle the load, we might have to consider vertical scaling of nodes, or scaling cluster auto scaler.
Give a suitable buffer so that your application can handle spikes in traffic.
Your application should be stateless and no coupling between requests, with short requests.
Conclusion
HPA is a great feature in kubernetes which gives resilience to your application resources. It helps in mitigating a quick spike in traffic. But all is limited within the existing cluster capacity. It will not help you increase your clusters capacity. For that, you might have to consider vertical pod autoscaler, which will be my next topic. Thanks for the read and feel free to asks questions.
Taken from my own article on medium:
https://asishmm.medium.com/discussion-on-horizontal-pod-autoscaler-with-a-demo-on-local-k8s-cluster-81694c09f818