Degraded Operator-Lifecycle-Manager-Packageserver ClusterOperator on OpenShift
Last week all of our OpenShift (OKD) clusters started alerting us about the same degraded condition:
|
|
The clusters were running fine and user workloads were not degraded, but one cluster operator, specifically the Operator-Lifecycle-Manager-Packageserver operator, was degraded due to:
|
|
Searching on the web revealed that we are not the only ones encountering this issue (see references at the bottom). To allow secure and encrypted communication between the OLM package server and the rest of the control plane components, a certificate is generated. Usually OpenShift is very good about automatically rotating certificates before they expire, but this case was not the case here (upstream bug: OCPBUGS-25341).
|
|
Generating a fresh certificate is easy enough by deleting the existing secret:
|
|
After a couple of seconds a new certificate is generated, the Operator-Lifecycle-Manager picks it up automatically and the control plane is happy again.
|
|
Why did this happen today? And why on all of the clusters at the same time? Some researching revealed that the Operator-Lifecycle-Manager package-server-manager component was first introduced with OpenShift 4.9. This can be confirmed by looking at the creation timestamp of the related namespace:
|
|
and comparing to the date we upgraded our clusters to release 4.9:
|
|
The timestamps match! That explains why all our clusters (which are deployed in completely isolated environments) encountered this condition at the same time.
# References
- Cluster operator operator-lifecycle-manager-packageserver is stuck in progressing state
- How to renew/rotate OLM packageserver apiservice CAbundle in Red Hat OpenShift Container Platform 4
- The certificate relating to operator-lifecycle-manager-packageserver isn’t rotated after expired
- CSV packageserver in openshift-operator-lifecycle-manager “found the serving cert not active”