Switching from Nginx to Caddy - or not?

Posted on Oct 19, 2022

In my homelab I’m self-hosting a couple of static websites from Minio S3 buckets - including the blog you are reading this article on. Using S3 buckets for static file hosting is great, because while the S3 interface was originally proprietary to AWS, it is nowadays widely supported by many tools and services. In addition, the S3 API comes with excellent built-in authorization primitives: practically this means it’s very easy to generate temporary access keys and distribute these to various people, machines etc. because they come with least privileges and can be revoked trivially.

However, there is one issue: while the S3 API is directly available via HTTP, it is not “web browser friendly”, i.e. it does not support pretty URLs like http://s3.example.com/foobar/. Instead, one would need to make sure to always navigate to a particular file, such as http://s3.example.com/foobar/index.html (otherwise the user will see an ugly XML error page). This might have been acceptable behavior 20 years ago, but today users are accustomed to better UX. Therefore, an intermediate proxy is necessary to perform URL rewriting: for example, when the requested path ends with a slash (/), then the index.html suffix should automatically be added before requesting the file from the S3 endpoint.

1
2
3
4
5
6
7
             GET /FOOBAR           GET /FOOBAR/INDEX.HTML
   ┌────────┐         ┌───────────────┐         ┌──────┐
   │        │ ──────► │ REVERSE PROXY │ ──────► │      │
   │ CLIENT │         │               │         │  S3  │
   │        │ ◄────── │   + CACHE     │ ◄────── │      │
   └────────┘         └───────────────┘         └──────┘
               200 OK                    200 OK

In addition, this proxy should also cache the content for some time. This way, the service can be scaled for large number of requests by simply increasing the number of proxies (spread across multiple machines), without impacting the backend (a “mini” CDN)

So far, I was running such a setup with Nginx. However, over the last couple of years and new contender in the realm of static file serving and reverse proxying has emerged: Caddy.

Caddy boasts with its easy configuration (many people have been bitten by some footguns in Nginx' configuration file), included configuration API, automatic HTTPS, high performance, and extensibility (thanks to its modules). Especially the easy configuration is often cited on Hacker News et al. as one of the major advantages of Caddy.

So I decided to give it a try and convert the following Nginx configuration into an equivalent Caddy config (Caddyfile). In this post I will go over reverse proxying with Caddy, URL rewriting, modifying headers, disallowing methods, caching and metrics monitoring.

In this particular example, the reverse proxy should listen to blog.cubieserver.de and forward the requests to s3.cubieserver.de/blog-cubieserver-de/ (while preserving the rest of the URL path). I won’t explain the details of the Nginx configuration, since this post will focus on the Caddy part.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
server {
    listen 8080 default_server;

    set $s3_bucket "blog-cubieserver-de";

    location / {
        # only forward read verbs to the backend
        limit_except HEAD GET {
            deny all;
        }

        # redirect */ to */index.html, so Minio backend finds the file
        rewrite ^(.*)/$ $1/index.html break;

        # hide these Minio headers
        proxy_hide_header "X-AMZ-Bucket-Region";
        proxy_hide_header "X-AMZ-Request-Id";

        # let Nginx cache resources from Minio for one hour
        proxy_cache blog_cache;
        proxy_cache_valid 200 1h;
        proxy_cache_use_stale error timeout http_500 http_502 http_503 http_504;
        proxy_cache_revalidate on;
        proxy_cache_lock on;

        add_header X-Proxy-Cache $upstream_cache_status;

        # configure client cache for one day
        expires 1d;
        add_header Pragma public;
        add_header Cache-Control "public";

        # Set correct Host header
        proxy_set_header Host s3.cubieserver.de;
        # $uri already contains leading slash
        proxy_pass https://s3.cubieserver.de/$s3_bucket$uri;

        # use kubernetes DNS resolver
        resolver kube-dns.kube-system.svc.cluster.local ipv6=off;
    }
}

server {
    listen 8081 default_server;

    location /_/healthz {
        access_log off;
        error_log /dev/stderr error;
        return 200;
    }
}

# Basic reverse proxy

Let’s try to replicate the Nginx configuration above step-by-step with Caddy. A basic Caddyfile for reverse proxying looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Listen explicitely for HTTP connections, such that Caddy doesn't start its automatic certificate setup.
:8080 {
	# prepend bucket name to URI
	rewrite * /www-cubieserver-de{path}

	# Set up proxy to the backend endpoint
	# https://caddyserver.com/docs/caddyfile/directives/reverse_proxy
	reverse_proxy {
		to https://s3.cubieserver.de/

		# set modified Host header when sending upstream request
		header_up Host {upstream_hostport}
	}
}

Here we have the first difference from Nginx: if the to section contains the upstream path (to https://s3.cubieserver.de/blog-cubieserver-de/), Caddy will complain that:

1
Caddyfile:6 - Error during parsing: for now, URLs for proxy upstreams only support scheme, host, and port components

Therefore we need to first rewrite the URL internally (line 4 above), before we can forward it to the upstream (line 9).

Let’s give it a try:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
$ curl -sL https://github.com/caddyserver/caddy/releases/download/v2.6.2/caddy_2.6.2_linux_amd64.tar.gz | tar zvxf -
$ ./caddy --config Caddyfile --adapter caddyfile &
$ curl -I localhost:8080/index.html
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 3604
Content-Security-Policy: block-all-mixed-content
Content-Type: text/html
Date: Mon, 17 Oct 2022 19:53:26 GMT
Etag: "00000000000000000000000000000000-1"
Last-Modified: Mon, 22 Nov 2021 20:00:31 GMT
Server: Caddy
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Request-Id: 171EF348E3F35270
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block

So far, so good!

# Inflight URL rewriting

As explained in the introduction, when requesting a folder (instead of an existing file), we get a nasty 404 response. Let’s fix that next: essentially we to rewrite all requests that have a trailing slash (regex: ^(.*)/$) to the index file.

1
2
3
4
5
6
7
8
	@pathWithoutFile {
		path_regexp ^(.*)/$
	}
	# Rewrite paths ending with '/' to '/index.html',
	rewrite @pathWithoutFile /blog-cubieserver-de{path}index.html
	# Rewrite other requests to have the appropriate bucket prefix, too.
	# Note that this is mutually exclusive with the previous rule.
	rewrite * /blog-cubieserver-de{path}

Here I came across my first footgun with Caddy: Rewrite rules are always mutually exclusive - they are not processed in sequential order! Initially, I was expecting to be able to use the following snippet:

1
2
3
4
5
6
	@pathWithoutFile {
		path_regexp ^(.*)/$
	}
    # THIS DOES *NOT* WORK
	rewrite @pathWithoutFile {path}index.html
	rewrite * /blog-cubieserver-de{path}

In my mind, this should rewrite /foobar/ to /foobar/index.html and then /blog-cubieserver-de/foobar/index.html. While Caddy will happily accept this configuration, it will silently apply just one of the rules - it took me quite some time to identify the issue. To be honest, I was shocked to find such unintuitive behavior so early during my exploration. But anyway, with the first snippet above it works:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
$ curl -I localhost:8080/
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 9292
Content-Security-Policy: block-all-mixed-content
Content-Type: text/html; charset=utf-8
Date: Mon, 17 Oct 2022 20:01:27 GMT
Etag: "c163431ac3b02e8853cf12ab416ded12"
Last-Modified: Sun, 16 Oct 2022 15:07:00 GMT
Server: Caddy
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Meta-Mtime: 1665932819.503718044
X-Amz-Request-Id: 171EF3B8EE06F29A
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block

# Modifying HTTP headers

Now we still have a bunch of unnecessary headers in the response. Let’s strip those:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
	reverse_proxy {
		to https://s3.cubieserver.de

		# hide these headers (from S3/Minio) when sending response to client
		header_down -X-Amz-*
		header_down -Server
		# set appropriate Host header when sending upstream request
		# this is required because the upstream has a different hostname
		header_up Host {upstream_hostport}
	}

In a Caddyfile, the header_up directive is used for modifying request headers which should be sent to the backend (upstream), whereas the header_down directive is used for modifying response headers sent to the client. The fact that headers are removed by simply prefixing them with a minus (-) makes me a bit iffy, but as long it works reliably, it’s fine for me. It’s definitely nice to see that we can remove headers with wildcards (and don’t need to explicitely override each of them).

# Restricting HTTP verbs

Next, we also want to ensure that we only proxy GET and HEAD requests, just like the initial Nginx config:

1
2
3
4
5
6
7
	# Only allow GET and HEAD requests, so no one can send API calls to the backend
	# https://caddyserver.com/docs/caddyfile/matchers#method
	@disallowedMethods {
		not method GET HEAD
	}
	# https://caddyserver.com/docs/caddyfile/directives/respond
	respond @disallowedMethods "HTTP Method Not Allowed" 405

And let’s try:

1
2
$ curl -X POST localhost:8080/index.html
HTTP Method Not Allowed

It works!

# Caching

Alright, let’s move on to setting up the proxy cache. After some searching around online, I was rather disappointed: Caddy does not have a built-in solution for caching upstream responses. There are several plugins (“Caddy modules”) available online: CDP Cache, Souin Caddy module, Caddy Cache Handler and Caddy Cache - oh no wait, the last is only for Caddy v1 and already deprecated - this is exactly the reason why I’m not a fan of “external” plugins: when there is a serious bug or security vulnerability, you have no idea if it’s ever going to be addressed.

Anyway, looking at the remaining options, Souin and Cache Handler seem rather complex and focused on distributed caching, which is not something I necessarily need in my setup. CDP Cache looks much simpler, but the quality of the repository is not very convincing.

In any case, all these modules have a huge drawback: they are external modules that need to be compiled together with Caddy. Caddy offers the xcaddy command-line tool, which allows you to easily build Caddy with plugins.

1
xcaddy build v2.4.1 --with github.com/sillygod/cdp-cache

That’s all well and good for local development, but how is that supposed to work in a containerized environment? You don’t want to build the server binary every time you start the container!

The recommended approach is building your container image with the required modules:

1
2
3
4
5
6
7
8
9
FROM caddy:<version>-builder AS builder

RUN xcaddy build \
    --with github.com/caddyserver/nginx-adapter \
    --with github.com/hairyhenderson/caddy-teapot-module@v0.0.3-0

FROM caddy:<version>

COPY --from=builder /usr/bin/caddy /usr/bin/caddy

Fantastic! Now I not only need to maintain the Caddy version and configuration, but also an additional container image and CI pipeline that needs to be kept up-to-date and rebuild regularly!

I’m skipping this step for now.

# Metrics

Let’s shift the focus to another aspect of a cloud-native deployment: monitoring. To get OpenMetrics-compatible data (suitable for Prometheus and friends) out of Nginx, an external exporter is required. Caddy already has this feature built-in (it’s one of the “standard modules”).

The following Caddy config snippet exposes the metrics path on a separate port (here: 8081). This makes it easy to scrape the metrics internally (e.g. inside a Kubernetes cluster), but avoids exposing the metrics globally (without authentication), which is the case with the default configuration. This snippet also sets up a small healthcheck endpoint, which can be useful for health probes from the orchestration manager (e.g. Kubernetes Readiness Probes or Nomad Service Checks).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Global config options
# https://caddyserver.com/docs/caddyfile/options
{
	# turn off admin API endpoint
	admin off
	# enabled metics
	# https://caddyserver.com/docs/metrics
	servers {
		metrics
	}
}

# Internal endpoint for serving healthchecks etc.
:8081 {
	# https://caddyserver.com/docs/caddyfile/directives/respond
	respond /healthz 200
	# no log output for internal healthchecks
	log {
		output discard
	}
	# expose metrics endpoint (disable by default because we turn off the admin API)
	# https://caddyserver.com/docs/caddyfile/directives/metrics
	metrics /metrics
}

Let’s give it a try:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
$ curl -i -X GET localhost:8081/healthz
HTTP/1.1 200 OK
Server: Caddy
Date: Tue, 18 Oct 2022 18:59:01 GMT
Content-Length: 0

$ curl -i -X GET localhost:8081/metrics
HTTP/1.1 200 OK
Content-Type: text/plain; version=0.0.4; charset=utf-8
Server: Caddy
Date: Tue, 18 Oct 2022 18:59:07 GMT
Transfer-Encoding: chunked

caddy_http_request_duration_seconds_bucket{code="200",handler="reverse_proxy",method="GET",server="srv0",le="0.05"} 0
caddy_http_request_duration_seconds_bucket{code="200",handler="reverse_proxy",method="GET",server="srv0",le="0.1"} 0
caddy_http_request_duration_seconds_bucket{code="200",handler="reverse_proxy",method="GET",server="srv0",le="0.25"} 1
caddy_http_request_duration_seconds_bucket{code="200",handler="reverse_proxy",method="GET",server="srv0",le="0.5"} 1

# (abbreviated output)

The healthcheck endpoint works as expected. The metrics endpoint also returns data, but this data is not very useful because it just has a rather uninformative server="srv0" label. Ideally, I’d like to be able to distinguish the metrics based on hostnames / vhosts. This (obvious?) feature is currently not supported, but has been requested on Caddy’s issue tracker. Additionally, the way Caddy generates metrics appears to be implemented in an inefficient manner, going so far that a feature toggle was introduced to disable the metrics endpoint. While these issues will probably eventually be fixed, they are definitely not a good impression for a “modern” web server. However, it is laudable that the Caddy docs have a dedicated page explaining what each of these metrics mean.

# Putting it all together

Finally, let’s put all of the small configuration snippets together into a Caddyfile:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# Global config options
# https://caddyserver.com/docs/caddyfile/options
{
	# turn off admin API endpoint
	admin off
	# enabled metics
	# https://caddyserver.com/docs/metrics
	servers {
		metrics
	}
}

# Internal endpoint for serving healthchecks etc.
:8081 {
	# https://caddyserver.com/docs/caddyfile/directives/respond
	respond /healthz 200
	# no log output for internal healthchecks
	log {
		output discard
	}
	# expose metrics endpoint (disable by default because we turn off the admin API)
	# https://caddyserver.com/docs/caddyfile/directives/metrics
	metrics /metrics
}

# Listen explicitely for HTTP connections, such that Caddy doesn't start its automatic certificate setup.
:8080 {
	@pathWithoutFile {
		path_regexp ^(.*)/$
	}
	# Rewrite paths ending with '/' to '/index.html',
	# because the S3 backend (Minio) sends an ugly XML directory listing otherwise.
	rewrite @pathWithoutFile /blog-cubieserver-de{path}index.html
	# Rewrite other requests to have the appropriate bucket prefix, too.
	# Note that this is mutually exclusive with the previous rule, see
	# https://caddy.community/t/composing-in-the-caddyfile/8291
	rewrite * /blog-cubieserver-de{path}

	# Only allow GET and HEAD requests, so no one can send API calls to the backend
	# https://caddyserver.com/docs/caddyfile/matchers#method
	@disallowedMethods {
		not method GET HEAD
	}
	# https://caddyserver.com/docs/caddyfile/directives/respond
	respond @disallowedMethods "HTTP Method Not Allowed" 405

	# Set up proxy to the backend endpoint
	# https://caddyserver.com/docs/caddyfile/directives/reverse_proxy
	reverse_proxy {
		# the upstream proxy (without URI!)
		to https://s3.cubieserver.de

		# hide these headers (from S3/Minio) when sending response to client
		header_down -X-Amz-*
		header_down -Server

		# set appropriate Host header when sending upstream request
		# this is required because the upstream has a different hostname
		header_up Host {upstream_hostport}
	}
}

This Caddyfile has 30 lines of content (grep -v -E '[[:space:]]*#' Caddyfile | grep . | wc -l). The Nginx config at the top of this post has 32 - but that one also implements a local file cache! If we omit the statements relating to caching, it comes in at 20 lines - one third less compared to Caddy!

# Conclusion

I was definitely not expecting this result when I started this exploration quest. Using the number of config lines is certainly not the best comparison metric. But to be completely honest, I’m kind of disappointed that I already had to discover two significant stumbling blocks while configuring Caddy (multiple rewrite directives and metrics). Maybe I simply had too high expectations for Caddy after all the hype I have been hearing online.

In addition, after this short experience I’m also not a fan of the Caddyfile format, because depending on the level of detail, you need to use different kinds of directives. For example, basic URL rewriting can be achieved with:

1
	rewrite * /blog-cubieserver-de{path}

But for slightly more advanced regex matching, I can not use the same syntax:

1
	rewrite ^(.*)/$ /blog-cubieserver-de{path}index.html

Instead, I need to define a separate named matcher:

1
2
3
4
	@pathWithoutFile {
		path_regexp ^(.*)/$
	}
	rewrite @pathWithoutFile /blog-cubieserver-de{path}index.html

I can totally understand this behavior from a technical point of view, but from the user’s perspective it is quite confusing.

Overall, I’m not satisfied with the experience. At least for now, it looks like I will stick with Nginx - at least until Caddy has “proper” metrics support.