At work I came across a script that (was intended to) print out all “failed” Pods in a Kubernetes namespace. The script was executing the following command:
Despite the fact that the command reads quite logically, it didn’t print out the expected result.
So I started investigating and was immediately confused by this
Whoa, how can this be?
kubectl get pods says the Pod is in
CrashLoopBackOff, but when manually printing out the field the Pod is running?
It turns out that the
.status.phase field actually describes the scheduling state, not the actual state.
The StackOverflow answers for the post “How to get status of a particular pod” unfortunately recommend the wrong command, too.
$ kubectl explain pod.status.phase KIND: Pod VERSION: v1 FIELD: phase <string> DESCRIPTION: The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. The conditions array, the reason and message fields, and the individual container status arrays contain more detail about the pod's status. There are five possible phase values: Pending: The pod has been accepted by the Kubernetes system, but one or more of the container images has not been created. This includes time before being scheduled as well as time spent downloading images over the network, which could take a while. Running: The pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting. Succeeded: All containers in the pod have terminated in success, and will not be restarted. Failed: All containers in the pod have terminated, and at least one container has terminated in failure. The container either exited with non-zero status or was terminated by the system. Unknown: For some reason the state of the pod could not be obtained, typically due to an error in communicating with the host of the pod. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-phase
So while a Pod can be in the lifecycle phase Running, it might still be failing (for example because it crashes).
And unless your Pod has restart policy Never, it will never go into the Failed scheduling state.
Next try: the
.status.reason field – maybe this will print the desired
CrashLoopBackOff shown above?
No, unfortunately not. Again, the
.status.reason field is populated by the scheduler.
To actually figure out the reason why the Pod is not available, we need to go much deeper and look at its individual containers:
This command actually looks at the status of each individual container in each pod and searches for those that are currently “waiting” for some reason. Waiting in this case means “waiting for container to be ready” and will therefore capture any container that does not pass its readiness probe.