Monitoring HTTP traffic of Kubernetes applications with mitmproxy

In this post I want to share a method I used recently for understanding the network activity of an application running on Kubernetes. Specifically the application I was looking at was velero (a backup and recovery tool): I could not figure out to which object storage endpoints it was talking to and which credentials it was using.

I searched the internet for a transparent HTTP proxy that can intercept outgoing network traffic and log details about the HTTP request and response. Of course, on a lower layer this is possible by using good’old tcpdump, but that quickly stops being useful when you’re dealing with encrypted HTTPS traffic. In that case the only information we could see would be the origin and destination IP and port pairs plus the server hostname (via SNI) – not nearly enough useful information for debugging. At the same time one could also make the argument that this a perfect use case for a service mesh (Istio, Linkerd & co), however for debugging a single application setting up a service mesh would be complete overkill.

I came across the mitmproxy project which bills itself as an interactive TLS-capable, intercepting HTTP proxy for penetration testers and software developers. mitmproxy can capture outgoing HTTP(S) requests and allows you to modify requests and response in real time via a TUI (terminal client), web interface or Python API. For the purpose of my debugging I only needed traffic logging which can be achieved with the bundled mitmdump command.

To get started, let’s spin up a new Deployment using the official Docker image:

1
2
3
kubectl create deployment mitmproxy --image=docker.io/mitmproxy/mitmproxy:latest --port=8080 -- mitmdump --verbose
kubectl set env deployment/mitmproxy PYTHONUNBUFFERED=1 HOME=/tmp
kubectl create service clusterip mitmproxy --tcp=8080:8080

We’re setting the command of the container to mitmdump so that the requests and responses are printed on the terminal (we don’t need any interactivity). The environment variable PYTHONUNBUFFERED is set so the output is printed immediately (more details). The environment variable HOME is set to /tmp because this directory is guaranteed to writeable. Upon starting mitmproxy will generate a CA private key and root certificate and put it there.

1
2
3
4
5
6
$ kubectl exec deployment/mitmproxy -- ls -AR /tmp
/tmp:
.mitmproxy

/tmp/.mitmproxy:
mitmproxy-ca-cert.cer  mitmproxy-ca-cert.p12  mitmproxy-ca-cert.pem  mitmproxy-ca.p12  mitmproxy-ca.pem  mitmproxy-dhparam.pem

This leads us into the difficult topic of certificate management. In 2024 most HTTP connections are encrypted by default, especially if they are leaving the context of your Kubernetes cluster. However, to be able to see the contents of the requests mitmproxy needs to decrypt the outgoing request and then re-encrypt it before sending it to the upstream server. For this purpose the above mentioned certificate authority (CA) is used. But this CA is not trusted by other clients (yet), such as the velero application. To change that we will use the following trick:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# extract root certificate of mitmproxy
kubectl cp mitmproxy-7f8448c848-nkndp:/tmp/.mitmproxy/mitmproxy-ca-cert.pem mitmproxy-ca-cert.pem

# generate a combined list of publicly trusted CAs and mitmproxy CA
cat /etc/ssl/certs/ca-certificates.crt mitmproxy-ca-cert.pem > ca-certificates.crt

# store this combined list in a configmap
kubectl create configmap ca-certificates --from-file=ca-certificates.crt

# mount it into the application (velero in my case)
kubectl patch deployment/velero --type merge -p '{
  "spec": {
    "template": {
      "spec": {
        "volumes": [
          {
            "name": "ca-certificates",
            "configMap": {
              "name": "ca-certificates"
            }
          }
        ],
        "containers": [
          {
            "name": "velero",
            "volumeMounts": [
              {
                "name": "ca-certificates",
                "mountPath": "/etc/ssl/certs/ca-certificates.crt",
                "subPath": "ca-certificates.crt"
              }
            ]
          }
        ]
      }
    }
  }
}'

Lastly we need to tell the application that it should send its traffic through the mitmproxy service we created earlier. Luckily most well-behaved applications support the simple HTTP_PROXY environment variable, like for example velero does:

kubectl set env deployment/velero HTTP_PROXY=http://mitmproxy:8080 HTTPS_PROXY=http://mitmproxy:8080 NO_PROXY=172.30.0.1

In my case I’m setting NO_PROXY to 172.30.0.1 (a.k.a kubernetes.default.svc.cluster.local) so that connections to the Kubernetes API server do not get proxied because I was not interested in those requests, but only the external ones.

Note: make sure that no NetworkPolicies are blocking connections between the client application and the mitmproxy deployment - this can lead to very cryptic errors that are hard to troubleshoot.

If the application does not support configuring an HTTP proxy for outgoing requests, the mitmproxy documentation also has instructions for setting up transparent proxy, i.e. a mode where no client configuration is required.

Finally we should be able to see the intercepted HTTP requests and responses in the logs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$ kubectl logs deploy/mitmproxy
[09:22:34.791] HTTP(S) proxy listening at *:8080.
[09:23:28.535][10.76.3.78:57318] server connect s3.cern.ch:443
10.76.3.78:57318: GET https://s3.cern.ch/foo-bar?delimiter=%2F&list-type=2&prefix=velero%2F HTTP/2.0
    amz-sdk-request: attempt=1; max=3
    x-amz-date: 20240812T092328Z
    authorization: AWS4-HMAC-SHA256 ...
    x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    user-agent: aws-sdk-go-v2/1.21.0 os/linux lang/go#1.21.9 md/GOOS#linux md/GOARCH#amd64 api/s3#1.40.0
    accept-encoding: identity
    amz-sdk-invocation-id: 8f852759-6c85-4dd8-b4e2-e8484f0eec9f
 << HTTP/2.0 200 OK 354b
    bucket: foo-bar
    content-type: application/xml
    date: Mon, 12 Aug 2024 09:23:28 GMT
    x-amz-request-id: tx00000a75a672522ec788b-0066b9d490-3b408a73-default
10.76.3.78:57306: GET https://s3.cern.ch/foo-bar?delimiter=%2F&list-type=2&prefix=velero%2Fbackups%2F HTTP/2.0
    amz-sdk-request: attempt=1; max=3
    x-amz-date: 20240812T092328Z
    authorization: AWS4-HMAC-SHA256 ...
    x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    user-agent: aws-sdk-go-v2/1.21.0 os/linux lang/go#1.21.9 md/GOOS#linux md/GOARCH#amd64 api/s3#1.40.0
    accept-encoding: identity
 << HTTP/2.0 200 OK 2.7k
    bucket: foo-bar
    content-type: application/xml
    date: Mon, 12 Aug 2024 09:23:28 GMT
    x-amz-request-id: tx00000ec08b32bae36d151-0066b9d490-3b453ea1-default
[09:23:28.582][10.76.3.78:57318] client disconnect
[09:23:28.582][10.76.3.78:57318] closing transports...
[09:23:28.582][10.76.3.78:57318] server disconnect s3.cern.ch:443
[09:23:28.583][10.76.3.78:57318] transports closed!

Happy debugging!