Sequential Unit Startup in Systemd

At work we are running some (new) nodes in Puppet Masterless mode. This means instead of querying a Puppet server, they collect the resource and compile the code themselves before applying it. That requires having the Puppet and Hiera code present on the machine, for which we use g10k (a blazing fast reimplementation of r10k in Go) and a custom postrun for linking the appropriate modules in each environment.

To run and regularly execute these tasks, we deployed systemd services and timers. I set up two services: one for fetching Puppet modules (with g10k and postrun) and another one for applying the configuration (with puppet apply).

fetch-modules.service: (full source)

[Unit]
Description=Fetch Updates for Puppet Modules with g10k and Postrun
Requires=local-fs.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/g10k -config /etc/puppetlabs/r10k/g10k.yaml
ExecStart=/usr/bin/python3 /etc/puppetlabs/r10k/postrun/postrun.py

[Install]
WantedBy=multi-user.target

apply.service: (full source)

[Unit]
Description=Puppet Apply in Masterless mode
Requires=local-fs.target
Wants=fetch-modules.service

[Service]
Type=oneshot

SuccessExitStatus=0 2
ExecStart=/opt/puppetlabs/bin/puppet \
                                     apply \
                                     --detailed-exitcodes \
                                     --log_level err \
                                     /etc/puppetlabs/code/environments/production/manifests/site.pp

[Install]
WantedBy=multi-user.target

According to the following documentation this configuration seemed logical to me.

Type=oneshot ensures “the process exits before systemd starts follow-up units” (see systemd.service(5)). The tools we run are simple one-off jobs, so waiting until they are done is the behavior we want.

Wants=fetch-modules.service (in apply.service) ensures “Units listed in this option will be started if the configuring unit is. However, if the listed units fail to start or cannot be added to the transaction, this has no impact on the validity of the transaction as a whole.” (see systemd.unit(5)) Again, this is precisely the desired behavior. In case cloning a new version of the code fails, just run the old one instead (this ensures the systems are always in a consistent state).

But it didn’t quite work. Sure, running systemctl start apply launched the service(s), but we always got weird errors like:

systemd[1]: Starting Puppet Apply in Masterless mode...
puppet[3874]: Error: Evaluation Error: Error while evaluating a Function Call, Could not find class ::vision_default (file: puppetlabs/code/environments/production/manifests/site.pp, line: 7, column: 3)
systemd[1]: apply.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: Failed to start Puppet Apply in Masterless mode.
systemd[1]: apply.service: Unit entered failed state.
systemd[1]: apply.service: Failed with result 'exit-code'.

This seemed odd. Initially I assumed our postrun script was not writing out the data onto disk and systemd was executing the units too fast after each other. But even adding a ExecStartPre=/bin/sync into apply.service did not help.

Then it dawned on me: systemd was executing both services simultaneously, because fetch-modules.service had already run before, therefore apply.service could now be executed immediately.

Reading through the systemd documentation (systemd.unit(5)) confirmed this assumption:

Before=, After=

A space-separated list of unit names. Configures ordering dependencies between units. If a unit foo.service contains a setting Before=bar.service and both units are being started, bar.service’s start-up is delayed until foo.service is started up. Note that this setting is independent of and orthogonal to the requirement dependencies as configured by Requires=. It is a common pattern to include a unit name in both the After= and Requires= option, in which case the unit listed will be started before the unit that is configured with these options.

Aha! So the solution is simply adding a After=fetch-modules.service. Sometimes, things can be so trivial.

 Requires=local-fs.target
 Wants=fetch-modules.service
+After=fetch-modules.service