List all failed Pods in a namespace with kubectl

Posted on Jul 9, 2021

At work I came across a script that (was intended to) print out all “failed” Pods in a Kubernetes namespace. The script was executing the following command:

1
2
3
$ kubectl get pods \
    --field-selector="status.phase!=Succeeded,status.phase!=Running" \
    -o custom-columns="POD:metadata.name"

Despite the fact that the command reads quite logically, it didn’t print out the expected result. So I started investigating and was immediately confused by this kubectl output:

1
2
3
4
5
6
7
$ kubectl get pods
NAME                          READY   STATUS
awesome-app                   0/1     CrashLoopBackOff

$ kubectl get pods -o custom-columns="POD:metadata.name,PHASE:status.phase"
POD                           PHASE
awesome-app                   Running

Whoa, how can this be? kubectl get pods says the Pod is in CrashLoopBackOff, but when manually printing out the field the Pod is running? It turns out that the .status.phase field actually describes the scheduling state, not the actual state. The StackOverflow answers for the post “How to get status of a particular pod” unfortunately recommend the wrong command, too.

$ kubectl explain pod.status.phase
KIND:     Pod
VERSION:  v1

FIELD:    phase <string>

DESCRIPTION:
     The phase of a Pod is a simple, high-level summary of where the Pod is in
     its lifecycle. The conditions array, the reason and message fields, and the
     individual container status arrays contain more detail about the pod's
     status. There are five possible phase values:

     Pending: The pod has been accepted by the Kubernetes system, but one or
     more of the container images has not been created. This includes time
     before being scheduled as well as time spent downloading images over the
     network, which could take a while. Running: The pod has been bound to a
     node, and all of the containers have been created. At least one container
     is still running, or is in the process of starting or restarting.
     Succeeded: All containers in the pod have terminated in success, and will
     not be restarted. Failed: All containers in the pod have terminated, and at
     least one container has terminated in failure. The container either exited
     with non-zero status or was terminated by the system. Unknown: For some
     reason the state of the pod could not be obtained, typically due to an
     error in communicating with the host of the pod.

     More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-phase

So while a Pod can be in the lifecycle phase Running, it might still be failing (for example because it crashes). And unless your Pod has restart policy Never, it will never go into the Failed scheduling state. Next try: the .status.reason field – maybe this will print the desired CrashLoopBackOff shown above?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
$ kubectl explain pod.status.reason
KIND:     Pod
VERSION:  v1

FIELD:    reason <string>

DESCRIPTION:
     A brief CamelCase message indicating details about why the pod is in this
     state. e.g. 'Evicted'

$ kubectl get pods -o custom-columns="POD:metadata.name,STATE:status.reason"
POD            STATE
awesome-app    <none>

No, unfortunately not. Again, the .status.reason field is populated by the scheduler.

To actually figure out the reason why the Pod is not available, we need to go much deeper and look at its individual containers:

1
2
3
4
5
$ kubectl get pods \
    -o custom-columns="POD:metadata.name,STATE:status.containerStatuses[*].state.waiting.reason" \
    | grep -v <none>
POD          STATE
awesome-app  CrashLoopBackOff

This command actually looks at the status of each individual container in each pod and searches for those that are currently “waiting” for some reason. Waiting in this case means “waiting for container to be ready” and will therefore capture any container that does not pass its readiness probe.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ kubectl explain pod.status.containerStatuses.state
KIND:     Pod
VERSION:  v1

RESOURCE: state <Object>

DESCRIPTION:
     Details about the container's current condition.

     ContainerState holds a possible state of container. Only one of its members
     may be specified. If none of them is specified, the default one is
     ContainerStateWaiting.

FIELDS:
   running	<Object>
     Details about a running container

   terminated	<Object>
     Details about a terminated container

   waiting	<Object>
     Details about a waiting container

# References:

https://kubernetes.io/docs/reference/kubectl/cheatsheet/

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/