Understanding Horizontal Pod Autoscaler using a demo on local k8s cluster

“Resilience is our ability to bounce back from life’s challenges and to thrive, grow and expand.”

HorizontalPodAutoscaler(HPA) is a sperate resource type in kubernetes, which scales the number of pods, based on CPU, Memory utilization or some custom metrics. HPA helps to optimise the number of replicas that need to be maintained in an environment for your applications, which helps in distributing load. Behind the scenes, autoscaler controller updates the replicas of k8s resource, like a deployment, replicaset or statefulset.

The value it brings to table is a more resilient application, that can take care of itself at times of increase in demand for applications. But there should be enough pysical resources for the pods to expand. In this article I will try to eplain HPA aalong with a demo.

HPA controller peridically checks metrics. When the average cpu and memory goes too high, it tell k8s to increase the replica count of the target deployment. So it needs to know how to get the metrics from the cluster in the first place. We will have to connect the controller to a metrics collector. Values for CPU, memory etc are obtained from the average of your pods. The most important thing is to have limits defined in your deployments. Then provide the minimum and maximum pod count to the HPA configuration. HPA will also scale down based on a cooldown period. It uses below formula to calculate the number of replicas to maintain.

Algorithm to calculate Replicas

desired_replica = ceil(current_replica * (current value/ target value))

To understand this lets consider two different scenarios where in a decision need to be taken.

Scenario 1

Assume that an application has some business requirement and based on the physical limits we come up with some ideal CPU/memory numbers. We wish to maintain the target CPU below 60% for the worker nodes. Then there was a spike in traffic and the current utilization reached 90%. The deployment object had defined 3 replicas and the pods are taking heavy load. Lets use the algorithm to find the desired pod count.

Target CPU utilization : 60%
Current Utilization: 90%
Current pods: 3
Desired Pods = ceil(current_pods * (current value/ target value))
Desired Pods = ceil(3*(.9/.6)) = 5

In this scenario, HPA controller will change the replicas in the deployment to 5, and scheduler will update the need by adding 2 more pods.

Scenario 2

Lets assume that after some period the traffic has come down, and now the current utilization has come down to 20%. We wish the additional replicas be reduced automatically. Lets see how the HPA controller grants our wish.

Target CPU Utilization : 60%
Current Utilization: 20%
Current pods : 5
Desired Pods = ceil(5*(.3/.6)) = 2

Here it determines only 2 replicas is sufficient and tells deployment controller to reduce it to this number. But deployment object had a minimum replica count of 3. So kubernetes will honor that and reduce the replica count from 5 to 3.

Lab Setup

I am going to use a “kind” cluster on my local laptop. KIND is kubernetes in docker, which is good for demo and testing kubernetes applications. You get the flexibility to add worker nodes and manage multiple clusters using KIND. For more details checkout my article on Kind

I am using KINDs extraPortMapping feature for creating a cluster to forward ports from host to ingress controller. We will be using nginx ingress controller.


kind: Cluster
name: hpacluster
- role: control-plane
  - |
    kind: InitConfiguration
        node-labels: "ingress-ready=true"
  - containerPort: 80
    hostPort: 80
    protocol: TCP
  - containerPort: 443
    hostPort: 443
    protocol: TCP
- role: worker
- role: worker
- role: worker
- role: worker

Create the k8s cluster

asishs-MacBook-Air:kind$ kind create cluster --config hpa-lab.yaml
Creating cluster "hpacluster" ...
 ✓ Ensuring node image (kindest/node:v1.21.1) 🖼
 ✓ Preparing nodes 📦 📦 📦 📦 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
 ✓ Joining worker nodes 🚜
Set kubectl context to "kind-hpacluster"
You can now use your cluster with:kubectl cluster-info --context kind-hpaclusterHave a nice day! 👋

Check the cluster.

Lets create our deploy object. I am using a nginx image for this.


apiVersion: apps/v1
kind: Deployment
    app: frontend
  name: frontend
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
      app: frontend
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
        app: frontend
      - image: nginx
        imagePullPolicy: Always
        name: nginx
        - containerPort: 80
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30


apiVersion: v1
kind: Service
    app: frontend
  name: frontend-svc
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
    app: frontend

Install the nginx controller specific patches for KIND cluster. The manifests contains KIND specific patches to forward the hostPorts to the ingress controller.

kubectl -n ingress-nginx apply -f

Wait till the ingress controller is ready to process

kubectl wait --namespace ingress-nginx \
  --for=condition=ready pod \ \

Create the ingress manifest for frontend service


kind: Ingress
  name: frontend-ingress
  - http:
      - path: /
          serviceName: frontend-svc
          servicePort: 80

After applying the above objects, we should be able to reach the nginx service on localhost.

asishs-MacBook-Air:hpa$ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
frontend-86968456b9-p7nc2   1/1     Running   0          60m
asishs-MacBook-Air:hpa$ kubectl get svc
frontend-svc   ClusterIP   <none>        80/TCP    60m
kubernetes     ClusterIP      <none>        443/TCP   61m
asishs-MacBook-Air:hpa$ kubectl get ingress
NAME               CLASS    HOSTS   ADDRESS     PORTS   AGE
frontend-ingress   <none>   *       localhost   80      57masishs-MacBook-Air:hpa$ curl -I http://localhost
HTTP/1.1 200 OK
Date: Sun, 20 Jun 2021 16:28:19 GMT
Content-Type: text/html
Content-Length: 612
Connection: keep-alive
Last-Modified: Tue, 25 May 2021 12:28:56 GMT
ETag: "60aced88-264"
Accept-Ranges: bytes

Lets add our HPA manifest and see what happens when we directly add it.


apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
  name: frontend-hpa
  namespace: default
  minReplicas: 3
  maxReplicas: 10
    apiVersion: apps/v1
    kind: Deployment
    name: frontend
  targetCPUUtilizationPercentage: 10

We are telling HPA to keep the target CPU utilization to 10%. Create the HPA object.

asishs-MacBook-Air:hpa$ kubectl apply -f hpa.yaml
horizontalpodautoscaler.autoscaling/frontend-hpa created
asishs-MacBook-Air:hpa$ kubectl get hpa
frontend-hpa   Deployment/frontend   <unknown>/10%   3         10        1          29s

Notice that the target percentage is shown as “unknown”. Which means our HPA controller is not able to get metrics of resources from this deployment. We can use a basic metric server to capture the metrics. If we need to define more advanced metrics, we can consider monitoring solutions like Prometheus.

Lets install Metrics server to our cluster.

asishs-MacBook-Air:hpa$ kubectl apply -f
serviceaccount/metrics-server created created created created created created
service/metrics-server created
deployment.apps/metrics-server created created

Check we can gather metrics foerm the cluster

asishs-MacBook-Air:hpa$ k top nodes
W0620 22:53:30.277142   49629 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get

When I check the logs for the metrics-server pod, I see that there is some certificate related errors

E0620 17:29:41.525715       1 scraper.go:139] "Failed to scrape node" err="Get \"\": x509: cannot validate certificate for because it doesn't contain any IP SANs" node="hpacluster-worker3"
E0620 17:29:41.534082       1 scraper.go:139] "Failed to scrape node" err="Get \"\": x509: cannot validate certificate for because it doesn't contain any IP SANs" node="hpacluster-worker4"

Lets disable this warning. Below is the complete set of args for the metrics-server deploy manifest which I am using.

      - args:
        - --cert-dir=/tmp
        - --secure-port=443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        - --kubelet-insecure-tls

Now lets check metrics server again.

asishs-MacBook-Air:hpa$ kubectl top nodes
W0621 07:24:43.894564   52298 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME                       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
hpacluster-control-plane   184m         4%     573Mi           28%