K8s - Solve pod is always pending - Big Wilte Cat's Home

Statement

I’m using colima & kind & kubebuilder to test a operator project in my local env. The way I set up a local kind cluster is Contour_Create_a_Kind_Cluster. The cluster looks like:

>> kubectl get node
NAME                 STATUS   ROLES           AGE   VERSION
kind-control-plane   Ready    control-plane   24h   v1.27.3
kind-worker          Ready    <none>          24h   v1.27.3

Then use make run to start and deploy a service. However, when I used kubectl get all -n ns to check the status of the service, I found that there is 0/1 pod READY and its STATUS is always pending. Also, the replicaset and deployment is not ready.

Trouble-shooting

Describe the pod and check the events:

kubectl describe pod pod-name -n ns

Find the events:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning  FailedScheduling  6m21s  default-scheduler  0/2 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling..

It means there is no node available to schedule the pod and it gives some reasons. In my case, it indicates three issues:

The node doesn’t have sufficient cpu.
The node doesn’t have sufficient memory.
The node has taint, and the pod has no related taint tolerations.

So check the node:

kubectl describe node kind-control-plane
kubectl describe node kind-worker

For issue#1 and issue#2, from the describe of node (both kind-control-plane and kind-worker), I can see the cpu number is 2 and the memory is 1941184Ki (about 2 GB) in Capacity and Allocatable part. It is limited by colima (some cases from google say that kind cannot add resource limitation, so it limited by colima vm). And in Conditions part, there are MemoryPressure, DiskPressure and PIDPressure. We can edit colima config by this command line and restart it.

colima start --edit

Expand the cpu to 4 and the memory to 8 GB. Then the cpu and memory limitation is updated in node. (But the MemoryPressure, DiskPressure and PIDPressure still exist with False, why?)

For issue#3, in node kind-control-plane there is a taint: Taints: node-role.kubernetes.io/control-plane:NoSchedule. Node kind-work has no taint. Actually, after applying the above changes, the pod can be scheduled to the worker node in my case. Just in case, here’s how to deal with taints.

There are two ways to resolve the taint:

Delete the taint in node (Not recommend)

>> kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
node/kind-control-plane untainted

(mind that there is a - at the end)

Add taint tolerance on pod

kubectl edit deployment deployment-name -n ns

Add taint tolerance in spec in the yaml file:

tolerations:
	- key: "special"
	operator: "Equal"
	value: "true"
	effect: "NoSchedule"

Then use kubectl describe pod and find that default-scheduler scheduled the pod successfully:

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  46m (x18 over 132m)  default-scheduler  0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod..
  Warning  FailedScheduling  20m                  default-scheduler  0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod..
  Normal   Scheduled         17m                  default-scheduler  Successfully assigned vcd-ose/vcd-ose-6377fd78-c7c1-4ad8-a222-41dd23605044-5fd9957587-ldqpz to kind-worker
  Normal   Pulling           17m                  kubelet            Pulling image "***"
  Normal   Pulled            14m                  kubelet            Successfully pulled image "***" in 3m13.030336154s (3m13.03038557s including waiting)
  Normal   Created           14m                  kubelet            Created container ***
  Normal   Started           14m                  kubelet            Started container ***
  Warning  Unhealthy         10m (x22 over 13m)   kubelet            Readiness probe failed: Get "http://10.244.1.3:8080/api/v1/core": dial tcp 10.244.1.3:8080: connect: connection refused

Reference

腾讯云 - Pod一直处于 Pending状态