Every night I’m getting a message after the backup job on my server has finished. I was wondering if it’s just my imagination or if these jobs are actually starting to take longer and longer. Since the backup job is a systemd oneshot service (which is triggered through a systemd timer), I thought there must be an easy way to find this out. And indeed there is! Thanks to the journal facility I have all the logs of the last two years.
You can view all logs available on your system like this:
So I wrote a Python helper script to parse those logs and find the outliers.
It leverages the fact that
journalctl will print the following message before and after each unit invocation:
Which yields the following example output:
In case you have some data accuracy issues (e.g. because the system rebooted while the backup was running), you will need to perform some manual data cleaning of the journalctl data before feeding it to the python script, or just ignore those items with a regex in the loop.
For the long term, it would be nice to feed this data into a Prometheus / Grafana setup, but that’s a job for another day.