I recently gave a lightning talk about Steampipe at CERN.
The recording is available here, the following is a transcript of the talk.
Cloud infrastructures tends to have lots of moving pieces: containers, loadbalancers, virtual machines, databases etc.
Steampipe is a tool that allows querying all these pieces through a single interface with SQL.
Are you tired of writing brittle Bash and JQ scripts? Then this is the tool for you!
The well-known tool for dealing with JSON APIs of any kind (AWS, Gitlab, Kubernetes, …) is jq: it allows filtering, modifying and even creating JSON on the command line.

Here’s an example. Given the following output from the OpenStack CLI (which shows all sorts of details about VMs):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
| $ openstack server list -f json
[
{
"ID": "06f3a71e-ecc5-42da-94f4-2af8646036a5",
"Name": "webeos-proto-tzv8k",
"Status": "ACTIVE",
"Networks": {
"pbulic": [
"122.122.122.122"
]
},
"Image": "fedora-coreos-33.20210217.3.0-openstack.x86_64",
"Flavor": "m2.xlarge"
},
{
"ID": "90d7b567-e95c-40f9-be12-68d52d8438af",
"Name": "infra-4rvnj",
"Status": "ACTIVE",
"Networks": {
"public": [
"123.123.123.123"
]
},
"Image": "fedora-coreos-33.20210217.3.0-openstack.x86_64",
"Flavor": "m2.xlarge"
},
...
|
we can extract only the disk images used for those VMs and count their occurences:
1
2
3
4
5
6
7
| $ openstack server list -f json | jq '.[] | .Image' | sort | uniq -c
4 ""
4 "fedora-coreos-33.20210217.3.0-openstack.x86_64"
7 "fedora-coreos-34.20210904.3.0-openstack.x86_64"
9 "fedora-coreos-35.20220327.3.0-openstack.x86_64"
60 "fedora-coreos-36.20220716.3.1-openstack.x86_64"
6 "fedora-coreos-37.20221127.3.0-openstack.x86_64"
|
Or query all pods deployed in a Kubernetes cluster that do not deploy from docker.io
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| $ oc get pods -A -o json | \
jq -r '.items[].spec.containers[] | select(.image | startswith("docker.io") | not) | "\(.name): \(.image)"' | \
sort -u
cephfs-provisioner: k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0
cephfs-registrar: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.4.0
cephfs-resizer: k8s.gcr.io/sig-storage/csi-resizer:v1.3.0
cephfs-snapshotter: k8s.gcr.io/sig-storage/csi-snapshotter:v5.0.1
cert-manager: quay.io/jetstack/cert-manager-cainjector:v1.5.5
cert-manager: quay.io/jetstack/cert-manager-controller:v1.5.5
cert-manager: quay.io/jetstack/cert-manager-webhook:v1.5.5
csi-provisioner: registry.k8s.io/sig-storage/csi-provisioner:v3.2.1
csi-resizer: registry.k8s.io/sig-storage/csi-resizer:v1.5.0
csi-snapshotter: registry.k8s.io/sig-storage/csi-snapshotter:v6.0.1
driver-registrar: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.5.1
external-dns: registry.k8s.io/external-dns/external-dns:v0.13.2
garbage-collector: image-registry.openshift-image-registry.svc:5000/openshift/cli:latest
node-problem-detector: k8s.gcr.io/node-problem-detector/node-problem-detector:v0.8.7
provisioner: k8s.gcr.io/sig-storage/csi-provisioner:v3.2.1
registrar: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.5.1
registry-server: registry.access.redhat.com/redhat/community-operator-index:v4.11
sitedetails-operator: gitlab-registry.cern.ch/paas-tools/operators/sitedetails-operator:RELEASE.2023.05.08T16-05-52Z
user-alerts-adapter: gitlab-registry.cern.ch/paas-tools/okd4-deployment/user-alerts-adapter:RELEASE.2023.03.08T10-57-46Z
webservices-blocker-controller: gitlab-registry.cern.ch/paas-tools/okd4-deployment/blocker-controller:RELEASE.2023.01.25T12-40-44Z
|
Or this recent example of query that took me multiple hours to put together because it’s very difficult to use multiple data sources with a JQ query.
1
2
3
4
5
6
7
8
9
10
11
| # get full JSON of all PVCs with type "cvmfs"
oc get pvc -A -o json | jq -r '{items: [.items[] | select(.spec.storageClassName | startswith("cvmfs-")?) | .]}' > cvmfs-pvcs.json
# get all running pods
oc get pods -A --field-selector=status.phase=Running -o json > running-pods.json
# obtain comma-separated and quoted list of all PVCs using CVMFS
cvmfs_pvcs_list=$(jq -c '[.items[]| .metadata.name]' cvmfs-pvcs.json | tr -d \[\])
# identify on which nodes these volumes are mounted
nodes_with_cvmfs=$(jq -r '.items[] | select(.spec.volumes[].persistentVolumeClaim.claimName | IN('$cvmfs_pvcs_list')) | .spec.nodeName' running-pods.json | sort -u)
|
We can see that jq
is really handy for performing quick look ups, but there are two drawbacks:
- data always flows linearly from left to right, and all data needs to be present in the initial input (no dynamic lookups)
- the query language is completely custom and needs to learned from scratch (usually including a lot of web searches…)

Steampipe allows querying “Cloud” resources with regular SQL
Here are some data sources that Steampipe supports with plugins:
- Cloud providers: AWS, Azure, DigitalOcean, Equinex, Fly.io, GCP, Hetzner, IBM Cloud, Linode, OVH
- Kubernetes, Keycloak, LDAP, Nomad
- Local files: INI, JSON, YAML, CSV
- VCS: GitLab, GitHub, Bitbucket
- Jira, Confluence
- Microsoft 365
Set up
Head to https://steampipe.io/downloads for up-to-date installation instructions.
1
2
3
4
5
6
7
8
| $ steampipe -v
Steampipe v0.20.3
$ steampipe plugin install turbot/kubernetes theapsgroup/gitlab
$ export KUBE_CONFIG_PATHS=<PATH_TO_YOUR_KUBECONFIG>
$ export GITLAB_ADDR=https://gitlab.cern.ch/api/v4
$ export GITLAB_TOKEN=<YOUR_GITLAB_PAT>
|
Let’s give it a try:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| # command mode:
$ steampipe query "select * from gitlab_version;"
+------------+-------------+------------------------------+
| version | revision | _ctx |
+------------+-------------+------------------------------+
| 15.10.8-ee | 9ec5f337f9f | {"connection_name":"gitlab"} |
+------------+-------------+------------------------------+
# interactive mode
$ steampipe query
> select * from gitlab_version;
+------------+-------------+------------------------------+
| version | revision | _ctx |
+------------+-------------+------------------------------+
| 15.10.8-ee | 9ec5f337f9f | {"connection_name":"gitlab"} |
+------------+-------------+------------------------------+
|
Show all available tables and describe them:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| > .tables
==> gitlab
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| table | description |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| gitlab_application | Obtain information about OAuth applications within the GitLab instance. |
| gitlab_branch | Obtain information on branches for a specific project within the GitLab instance. |
| gitlab_commit | Obtain information about commits for a specific project within the GitLab instance. |
| gitlab_epic | Obtain information about epics for a specific group within the GitLab instance. |
| gitlab_group | Obtain information about groups within the GitLab instance. |
...
> .inspect gitlab_issue
+-----------------------+--------------------------+----------------------------------------------------------------------------------+
| column | type | description |
+-----------------------+--------------------------+----------------------------------------------------------------------------------+
| assignee | text | The username of the user assigned to the issue - link to `gitlab_user.username` |
| assignee_id | bigint | The ID of the user assigned to the issue - link to `gitlab_user.id`. |
| assignees | jsonb | An array of assigned usernames, for when more than one user is assigned. |
|
Examples
List all issues that are assigned to me:
1
2
3
4
5
6
7
8
9
| > select title,labels from gitlab_issue where assignee = 'jhensche' and state = 'opened';
+----------------------------------------------------------------------+----------------------------------------------------------------------+
| title | labels |
+----------------------------------------------------------------------+----------------------------------------------------------------------+
| Document web redirector operations and troubleshooting | ["P::3"] |
| WR v2 CI improvements | [] |
| Document steps for high-availability in PaaS user docs | ["Area::docs","P::4","Project::PaaS","To Do"] |
| Improve secrets management: switch from Gitlab CI variables to Vault | ["Area::OKD4 deployment","In progress","P::2","Project::OKD4 infra"] |
+----------------------------------------------------------------------+----------------------------------------------------------------------+
|
Hmm, I would also like to see the project name, but the table only has a project_id
column (which is not very helpful).
Let’s resolve the ID to a name with SQL!
1
2
3
4
5
6
7
8
9
10
11
12
13
| > SELECT title,labels,gitlab_my_project.name
FROM gitlab_issue
LEFT JOIN gitlab_my_project
ON project_id = gitlab_my_project.id
WHERE assignee = 'jhensche' AND state = 'opened';
+----------------------------------------------------------------------+----------------------------------------------------------------------+------------------------+
| title | labels | name |
+----------------------------------------------------------------------+----------------------------------------------------------------------+------------------------+
| WR v2 CI improvements | [] | jhensche |
| Document web redirector operations and troubleshooting | ["P::3"] | webframeworks-planning |
| Document steps for high-availability in PaaS user docs | ["Area::docs","P::4","Project::PaaS","To Do"] | webframeworks-planning |
| Improve secrets management: switch from Gitlab CI variables to Vault | ["Area::OKD4 deployment","In progress","P::2","Project::OKD4 infra"] | webframeworks-planning |
+----------------------------------------------------------------------+----------------------------------------------------------------------+------------------------+
|
Show all issues that have been created in a project in the last week:
1
2
3
4
5
6
7
8
9
10
11
12
| SELECT title,author,created_at FROM gitlab_issue WHERE project_id = 85447 AND created_at >= '2023-06-05';
+----------------------------------------------------+----------+---------------------------+
| title | author | created_at |
+----------------------------------------------------+----------+---------------------------+
| Test webeos in Alma9 | foobarba | 2023-06-07T12:18:59+02:00 |
| Expore options for cleaning up test sites/previews | foobarba | 2023-06-07T12:11:53+02:00 |
| webeos-site-operator: Add server version in UPD | foobarba | 2023-06-07T12:07:14+02:00 |
| Find alternative of php-fpm for .htaccess | foobarba | 2023-06-07T12:01:48+02:00 |
| Review Apache configuration | foobarba | 2023-06-07T11:53:30+02:00 |
| Remove legacy-import from web-redirector-v2 | jhensche | 2023-06-06T09:38:31+02:00 |
| Prepare deployment for webeos in Alma9 | foobarba | 2023-06-07T11:46:17+02:00 |
+----------------------------------------------------+----------+---------------------------+
|
Kubernetes examples
Show all Statefulsets in a cluster:
1
2
3
4
5
6
7
8
9
10
11
12
13
| > select name,namespace from kubernetes_stateful_set;
+--------------------------------------------------+------------------------------------+
| name | namespace |
+--------------------------------------------------+------------------------------------+
| prometheus-user-workload | openshift-user-workload-monitoring |
| fluentd-aggregator | openshift-logging |
| alertmanager-main | openshift-monitoring |
| manila-csi-openstack-manila-csi-controllerplugin | openshift-cern-cephfs |
| sonarqube-sonarqube | test-alex-demo-category |
| prometheus-k8s | openshift-monitoring |
| thanos-ruler-user-workload | openshift-user-workload-monitoring |
| argocd-application-controller | openshift-cern-argocd |
+--------------------------------------------------+------------------------------------+
|
Let’s find all pods that use a CVMFS volume:
1
2
3
4
5
6
7
8
9
10
| > .inspect kubernetes_persistent_volume_claim
> .inspect kubernetes_pod
> SELECT name,namespace FROM kubernetes_persistent_volume_claim WHERE storage_class LIKE 'cvmfs%';
+--------------+---------------------+
| name | namespace |
+--------------+---------------------+
| cvmfs-bril | brilview-diamantis |
| cvmfs-cms-ib | test-alex-paas-stg2 |
| cvmfs-bril | test-brilview |
+--------------+---------------------+
|
Version 1 (basic: checks if a volume named *cvmfs*
is attached to a pod, this will likely miss some variants):
1
| > SELECT name,namespace FROM kubernetes_pod WHERE volumes #>> '{}' LIKE '%cvmfs%';
|
Version 2 (advanced: checks if the pod mounts a PVC named *cvmfs*
, still might lead to some false positives):
1
2
3
4
5
6
7
| > SELECT name,namespace
FROM kubernetes_pod
WHERE EXISTS (
SELECT TRUE
FROM jsonb_array_elements(volumes) x
WHERE x->'persistentVolumeClaim'->>'claimName' LIKE 'cvmfs%'
);
|
Version 3 (most advanced: lists the names of all PVCs with storage class cvmfs
and checks which pods mount these PVCs):
1
2
3
4
5
6
7
8
9
10
11
| > SELECT name,namespace
FROM kubernetes_pod
WHERE EXISTS (
SELECT TRUE
FROM jsonb_array_elements(volumes) x
WHERE x->'persistentVolumeClaim'->>'claimName' IN (
SELECT name
FROM kubernetes_persistent_volume_claim
WHERE storage_class LIKE 'cvmfs%'
)
);
|
Leverage the full power of PostgreSQL: store the output in a temporary table!
1
2
3
4
5
6
7
8
9
10
11
12
| > CREATE TEMPORARY TABLE namespaces_with_cvmfs AS (SELECT name,namespace,volumes
FROM kubernetes_pod
WHERE EXISTS (
SELECT TRUE
FROM jsonb_array_elements(volumes) x
WHERE x->'persistentVolumeClaim'->>'claimName' IN (
SELECT name
FROM kubernetes_persistent_volume_claim
WHERE storage_class LIKE 'cvmfs%'
)
));
SELECT * FROM namespaces_with_cvmfs;
|
Find out who the owner of these namespaces is:
1
2
3
4
5
6
7
8
9
10
11
| > SELECT
labels->>'lifecycle.webservices.cern.ch/owner' AS owner,
labels->>'lifecycle.webservices.cern.ch/resourceCategory' AS category
FROM kubernetes_namespace
WHERE name IN (SELECT namespace FROM namespaces_with_cvmfs);
+----------+----------+
| owner | category |
+----------+----------+
| foobarba | Official |
| deadbeef | Test |
+----------+----------+
|
Happy querying!
Resources: