Don't use containerd with the btrfs snapshotter
While I was setting up my homelab with k3s, I was looking through the documentation of k3s and came across the --snapshotter
argument.
It allows changing the mechanism containerd uses for assembling the container image layers and isolating writes inside the container from the host (see containerd snapshot design).
Since my host system is using btrfs as its main filesystem, I thought it makes sense to use the btrfs
snapshot driver, instead of the default overlayfs
driver.
(Side note: another popular snapshot driver is stargz, because it has advanced capabilities like lazy pulling and image optimizations).
The k3s cluster was up and running with the btrfs
driver, but I soon noticed exceptionally high CPU utilization of the containerd
processes.
Apart from a few maintenance tasks, the container runtime does not have much to do after starting the container, so this left me wondering.
If you want to learn more about the job of the container runtime, I highly recommend Ivan Velichko’s learning series about container managers.
Some searching on the internet revealed this k3s issue which refers to containerd issue #4217, which describe exactly the problem I’m seeing.
Furthermore, containerd issue #6067 seems to address exactly the same problem, at least to my eyes.
In short, the container runtime regularly collects disk usage statistics from the container using the snapshot driver.
Unfortunately, the btrfs
snapshot driver included in containerd
does not use btrfs' native quota feature (which would be very efficient, since the filesystem already has all the necessary data), but instead uses the some very expensive API calls which result in a full re-scan of all files in the container.
Even more unfortunate is the fact that containerd issue #4217
has been open for almost two years at the time of writing, so I don’t except there to be any fixes for this in the near future :-(
Out of curiosity, I quickly checked the available storage drivers for cri-o: while btrfs
is available, overlayfs
is still the default (even when using a btrfs filesystem) and the developers have no plans on changing that.
So it seems that the container world will continue using overlayfs on btrfs…