Query your cloud infrastructure interactively with Steampipe!

I recently gave a lightning talk about Steampipe at CERN. The recording is available here, the following is a transcript of the talk.


Cloud infrastructures tends to have lots of moving pieces: containers, loadbalancers, virtual machines, databases etc. Steampipe is a tool that allows querying all these pieces through a single interface with SQL. Are you tired of writing brittle Bash and JQ scripts? Then this is the tool for you!

The well-known tool for dealing with JSON APIs of any kind (AWS, Gitlab, Kubernetes, …) is jq: it allows filtering, modifying and even creating JSON on the command line.

Here’s an example. Given the following output from the OpenStack CLI (which shows all sorts of details about VMs):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$ openstack server list -f json
[
  {
    "ID": "06f3a71e-ecc5-42da-94f4-2af8646036a5",
    "Name": "webeos-proto-tzv8k",
    "Status": "ACTIVE",
    "Networks": {
      "pbulic": [
        "122.122.122.122"
      ]
    },
    "Image": "fedora-coreos-33.20210217.3.0-openstack.x86_64",
    "Flavor": "m2.xlarge"
  },
  {
    "ID": "90d7b567-e95c-40f9-be12-68d52d8438af",
    "Name": "infra-4rvnj",
    "Status": "ACTIVE",
    "Networks": {
      "public": [
        "123.123.123.123"
      ]
    },
    "Image": "fedora-coreos-33.20210217.3.0-openstack.x86_64",
    "Flavor": "m2.xlarge"
  },
...

we can extract only the disk images used for those VMs and count their occurences:

1
2
3
4
5
6
7
$ openstack server list -f json | jq '.[] | .Image' | sort | uniq -c
      4 ""
      4 "fedora-coreos-33.20210217.3.0-openstack.x86_64"
      7 "fedora-coreos-34.20210904.3.0-openstack.x86_64"
      9 "fedora-coreos-35.20220327.3.0-openstack.x86_64"
     60 "fedora-coreos-36.20220716.3.1-openstack.x86_64"
      6 "fedora-coreos-37.20221127.3.0-openstack.x86_64"

Or query all pods deployed in a Kubernetes cluster that do not deploy from docker.io:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ oc get pods -A -o json | \
	jq -r '.items[].spec.containers[] | select(.image | startswith("docker.io") | not) | "\(.name): \(.image)"' | \
	sort -u
cephfs-provisioner: k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0
cephfs-registrar: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.4.0
cephfs-resizer: k8s.gcr.io/sig-storage/csi-resizer:v1.3.0
cephfs-snapshotter: k8s.gcr.io/sig-storage/csi-snapshotter:v5.0.1
cert-manager: quay.io/jetstack/cert-manager-cainjector:v1.5.5
cert-manager: quay.io/jetstack/cert-manager-controller:v1.5.5
cert-manager: quay.io/jetstack/cert-manager-webhook:v1.5.5
csi-provisioner: registry.k8s.io/sig-storage/csi-provisioner:v3.2.1
csi-resizer: registry.k8s.io/sig-storage/csi-resizer:v1.5.0
csi-snapshotter: registry.k8s.io/sig-storage/csi-snapshotter:v6.0.1
driver-registrar: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.5.1
external-dns: registry.k8s.io/external-dns/external-dns:v0.13.2
garbage-collector: image-registry.openshift-image-registry.svc:5000/openshift/cli:latest
node-problem-detector: k8s.gcr.io/node-problem-detector/node-problem-detector:v0.8.7
provisioner: k8s.gcr.io/sig-storage/csi-provisioner:v3.2.1
registrar: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.5.1
registry-server: registry.access.redhat.com/redhat/community-operator-index:v4.11
sitedetails-operator: gitlab-registry.cern.ch/paas-tools/operators/sitedetails-operator:RELEASE.2023.05.08T16-05-52Z
user-alerts-adapter: gitlab-registry.cern.ch/paas-tools/okd4-deployment/user-alerts-adapter:RELEASE.2023.03.08T10-57-46Z
webservices-blocker-controller: gitlab-registry.cern.ch/paas-tools/okd4-deployment/blocker-controller:RELEASE.2023.01.25T12-40-44Z

Or this recent example of query that took me multiple hours to put together because it’s very difficult to use multiple data sources with a JQ query.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# get full JSON of all PVCs with type "cvmfs"
oc get pvc -A -o json | jq -r '{items: [.items[] | select(.spec.storageClassName | startswith("cvmfs-")?) | .]}' > cvmfs-pvcs.json

# get all running pods
oc get pods -A --field-selector=status.phase=Running -o json > running-pods.json

# obtain comma-separated and quoted list of all PVCs using CVMFS
cvmfs_pvcs_list=$(jq -c '[.items[]| .metadata.name]' cvmfs-pvcs.json | tr -d \[\])

# identify on which nodes these volumes are mounted
nodes_with_cvmfs=$(jq -r '.items[] | select(.spec.volumes[].persistentVolumeClaim.claimName | IN('$cvmfs_pvcs_list')) | .spec.nodeName' running-pods.json | sort -u)

We can see that jq is really handy for performing quick look ups, but there are two drawbacks:

  1. data always flows linearly from left to right, and all data needs to be present in the initial input (no dynamic lookups)
  2. the query language is completely custom and needs to learned from scratch (usually including a lot of web searches…)

#  Introducing: Steampipe

Steampipe allows querying “Cloud” resources with regular SQL

Here are some data sources that Steampipe supports with plugins:

  • Cloud providers: AWS, Azure, DigitalOcean, Equinex, Fly.io, GCP, Hetzner, IBM Cloud, Linode, OVH
  • Kubernetes, Keycloak, LDAP, Nomad
  • Local files: INI, JSON, YAML, CSV
  • VCS: GitLab, GitHub, Bitbucket
  • Jira, Confluence
  • Microsoft 365

#  Set up

Head to https://steampipe.io/downloads for up-to-date installation instructions.

1
2
3
4
5
6
7
8
$ steampipe -v
Steampipe v0.20.3

$ steampipe plugin install turbot/kubernetes theapsgroup/gitlab

$ export KUBE_CONFIG_PATHS=<PATH_TO_YOUR_KUBECONFIG>
$ export GITLAB_ADDR=https://gitlab.cern.ch/api/v4
$ export GITLAB_TOKEN=<YOUR_GITLAB_PAT>

Let’s give it a try:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# command mode:
$ steampipe query "select * from gitlab_version;"
+------------+-------------+------------------------------+
| version    | revision    | _ctx                         |
+------------+-------------+------------------------------+
| 15.10.8-ee | 9ec5f337f9f | {"connection_name":"gitlab"} |
+------------+-------------+------------------------------+

# interactive mode
$ steampipe query
> select * from gitlab_version;
+------------+-------------+------------------------------+
| version    | revision    | _ctx                         |
+------------+-------------+------------------------------+
| 15.10.8-ee | 9ec5f337f9f | {"connection_name":"gitlab"} |
+------------+-------------+------------------------------+

Show all available tables and describe them:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
> .tables
 ==> gitlab
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| table                             | description                                                                                                           |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| gitlab_application                | Obtain information about OAuth applications within the GitLab instance.                                               |
| gitlab_branch                     | Obtain information on branches for a specific project within the GitLab instance.                                     |
| gitlab_commit                     | Obtain information about commits for a specific project within the GitLab instance.                                   |
| gitlab_epic                       | Obtain information about epics for a specific group within the GitLab instance.                                       |
| gitlab_group                      | Obtain information about groups within the GitLab instance.                                                           |
...

> .inspect gitlab_issue
+-----------------------+--------------------------+----------------------------------------------------------------------------------+
| column                | type                     | description                                                                      |
+-----------------------+--------------------------+----------------------------------------------------------------------------------+
| assignee              | text                     | The username of the user assigned to the issue - link to `gitlab_user.username`  |
| assignee_id           | bigint                   | The ID of the user assigned to the issue - link to `gitlab_user.id`.             |
| assignees             | jsonb                    | An array of assigned usernames, for when more than one user is assigned.         |

#  Examples

List all issues that are assigned to me:

1
2
3
4
5
6
7
8
9
> select title,labels from gitlab_issue where assignee = 'jhensche' and state = 'opened';
+----------------------------------------------------------------------+----------------------------------------------------------------------+
| title                                                                | labels                                                               |
+----------------------------------------------------------------------+----------------------------------------------------------------------+
| Document web redirector operations and troubleshooting               | ["P::3"]                                                             |
| WR v2 CI improvements                                                | []                                                                   |
| Document steps for high-availability in PaaS user docs               | ["Area::docs","P::4","Project::PaaS","To Do"]                        |
| Improve secrets management: switch from Gitlab CI variables to Vault | ["Area::OKD4 deployment","In progress","P::2","Project::OKD4 infra"] |
+----------------------------------------------------------------------+----------------------------------------------------------------------+

Hmm, I would also like to see the project name, but the table only has a project_id column (which is not very helpful).

Let’s resolve the ID to a name with SQL!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
> SELECT title,labels,gitlab_my_project.name
FROM gitlab_issue
LEFT JOIN gitlab_my_project
ON project_id = gitlab_my_project.id
WHERE assignee = 'jhensche' AND state = 'opened';
+----------------------------------------------------------------------+----------------------------------------------------------------------+------------------------+
| title                                                                | labels                                                               | name                   |
+----------------------------------------------------------------------+----------------------------------------------------------------------+------------------------+
| WR v2 CI improvements                                                | []                                                                   | jhensche               |
| Document web redirector operations and troubleshooting               | ["P::3"]                                                             | webframeworks-planning |
| Document steps for high-availability in PaaS user docs               | ["Area::docs","P::4","Project::PaaS","To Do"]                        | webframeworks-planning |
| Improve secrets management: switch from Gitlab CI variables to Vault | ["Area::OKD4 deployment","In progress","P::2","Project::OKD4 infra"] | webframeworks-planning |
+----------------------------------------------------------------------+----------------------------------------------------------------------+------------------------+

Show all issues that have been created in a project in the last week:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
SELECT title,author,created_at FROM gitlab_issue WHERE project_id = 85447 AND created_at >= '2023-06-05';
+----------------------------------------------------+----------+---------------------------+
| title                                              | author   | created_at                |
+----------------------------------------------------+----------+---------------------------+
| Test webeos in Alma9                               | foobarba | 2023-06-07T12:18:59+02:00 |
| Expore options for cleaning up test sites/previews | foobarba | 2023-06-07T12:11:53+02:00 |
| webeos-site-operator: Add server version in UPD    | foobarba | 2023-06-07T12:07:14+02:00 |
| Find alternative of php-fpm for .htaccess          | foobarba | 2023-06-07T12:01:48+02:00 |
| Review Apache configuration                        | foobarba | 2023-06-07T11:53:30+02:00 |
| Remove legacy-import from web-redirector-v2        | jhensche | 2023-06-06T09:38:31+02:00 |
| Prepare deployment for webeos in Alma9             | foobarba | 2023-06-07T11:46:17+02:00 |
+----------------------------------------------------+----------+---------------------------+

#  Kubernetes examples

Show all Statefulsets in a cluster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
> select name,namespace from kubernetes_stateful_set;
+--------------------------------------------------+------------------------------------+
| name                                             | namespace                          |
+--------------------------------------------------+------------------------------------+
| prometheus-user-workload                         | openshift-user-workload-monitoring |
| fluentd-aggregator                               | openshift-logging                  |
| alertmanager-main                                | openshift-monitoring               |
| manila-csi-openstack-manila-csi-controllerplugin | openshift-cern-cephfs              |
| sonarqube-sonarqube                              | test-alex-demo-category            |
| prometheus-k8s                                   | openshift-monitoring               |
| thanos-ruler-user-workload                       | openshift-user-workload-monitoring |
| argocd-application-controller                    | openshift-cern-argocd              |
+--------------------------------------------------+------------------------------------+

Let’s find all pods that use a CVMFS volume:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
> .inspect kubernetes_persistent_volume_claim
> .inspect kubernetes_pod
> SELECT name,namespace FROM kubernetes_persistent_volume_claim WHERE storage_class LIKE 'cvmfs%';
+--------------+---------------------+
| name         | namespace           |
+--------------+---------------------+
| cvmfs-bril   | brilview-diamantis  |
| cvmfs-cms-ib | test-alex-paas-stg2 |
| cvmfs-bril   | test-brilview       |
+--------------+---------------------+

Version 1 (basic: checks if a volume named *cvmfs* is attached to a pod, this will likely miss some variants):

1
> SELECT name,namespace FROM kubernetes_pod WHERE volumes #>> '{}' LIKE '%cvmfs%';

Version 2 (advanced: checks if the pod mounts a PVC named *cvmfs*, still might lead to some false positives):

1
2
3
4
5
6
7
> SELECT name,namespace
FROM kubernetes_pod
WHERE EXISTS (
          SELECT TRUE
          FROM jsonb_array_elements(volumes) x
          WHERE x->'persistentVolumeClaim'->>'claimName' LIKE 'cvmfs%'
);

Version 3 (most advanced: lists the names of all PVCs with storage class cvmfs and checks which pods mount these PVCs):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
> SELECT name,namespace
FROM kubernetes_pod
WHERE EXISTS (
          SELECT TRUE
          FROM jsonb_array_elements(volumes) x
          WHERE x->'persistentVolumeClaim'->>'claimName' IN (
                    SELECT name
                    FROM kubernetes_persistent_volume_claim
                    WHERE storage_class LIKE 'cvmfs%'
          )
);

Leverage the full power of PostgreSQL: store the output in a temporary table!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
> CREATE TEMPORARY TABLE namespaces_with_cvmfs AS (SELECT name,namespace,volumes
FROM kubernetes_pod
WHERE EXISTS (
          SELECT TRUE
          FROM jsonb_array_elements(volumes) x
          WHERE x->'persistentVolumeClaim'->>'claimName' IN (
                    SELECT name
                    FROM kubernetes_persistent_volume_claim
                    WHERE storage_class LIKE 'cvmfs%'
          )
));
SELECT * FROM namespaces_with_cvmfs;

Find out who the owner of these namespaces is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
> SELECT
	labels->>'lifecycle.webservices.cern.ch/owner' AS owner,
	labels->>'lifecycle.webservices.cern.ch/resourceCategory' AS category
  FROM kubernetes_namespace
  WHERE name IN (SELECT namespace FROM namespaces_with_cvmfs);
+----------+----------+
| owner    | category |
+----------+----------+
| foobarba | Official |
| deadbeef | Test     |
+----------+----------+

Happy querying!


Resources: