List all failed Pods in a namespace with kubectl
At work I came across a script that (was intended to) print out all “failed” Pods in a Kubernetes namespace. The script was executing the following command:
|
|
Despite the fact that the command reads quite logically, it didn’t print out the expected result.
So I started investigating and was immediately confused by this kubectl
output:
|
|
Whoa, how can this be? kubectl get pods
says the Pod is in CrashLoopBackOff
, but when manually printing out the field the Pod is running?
It turns out that the .status.phase
field actually describes the scheduling state, not the actual state.
The StackOverflow answers for the post “How to get status of a particular pod” unfortunately recommend the wrong command, too.
$ kubectl explain pod.status.phase
KIND: Pod
VERSION: v1
FIELD: phase <string>
DESCRIPTION:
The phase of a Pod is a simple, high-level summary of where the Pod is in
its lifecycle. The conditions array, the reason and message fields, and the
individual container status arrays contain more detail about the pod's
status. There are five possible phase values:
Pending: The pod has been accepted by the Kubernetes system, but one or
more of the container images has not been created. This includes time
before being scheduled as well as time spent downloading images over the
network, which could take a while. Running: The pod has been bound to a
node, and all of the containers have been created. At least one container
is still running, or is in the process of starting or restarting.
Succeeded: All containers in the pod have terminated in success, and will
not be restarted. Failed: All containers in the pod have terminated, and at
least one container has terminated in failure. The container either exited
with non-zero status or was terminated by the system. Unknown: For some
reason the state of the pod could not be obtained, typically due to an
error in communicating with the host of the pod.
More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-phase
So while a Pod can be in the lifecycle phase Running, it might still be failing (for example because it crashes).
And unless your Pod has restart policy Never, it will never go into the Failed scheduling state.
Next try: the .status.reason
field – maybe this will print the desired CrashLoopBackOff
shown above?
|
|
No, unfortunately not. Again, the .status.reason
field is populated by the scheduler.
To actually figure out the reason why the Pod is not available, we need to go much deeper and look at its individual containers:
|
|
This command actually looks at the status of each individual container in each pod and searches for those that are currently “waiting” for some reason. Waiting in this case means “waiting for container to be ready” and will therefore capture any container that does not pass its readiness probe.
|
|
# References:
https://kubernetes.io/docs/reference/kubectl/cheatsheet/
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/