Getting started with Kopia and Backblaze

Posted on Apr 3, 2022

For years, I have been using using restic as my backup software of choice for servers and personal machines. There are several aspects I like about restic: it’s lightweight, fast and has a useful all-in-one CLI. On the technical side, it supports client-side encryption out-of-the-box and uses content-addressable storage to implement incremental backups and snapshots.

Recently, I became aware of a new kid on the block: Kopia. At first glance, it seems to offer the same benefits I just mentioned about restic, plus a few more niceties:

it supports compression - a feature that has been long requested for restic
it has a (optional) GUI - useful for backups for non-techy friends and family
it has a stable Go API - allows building anything on top of Kopia

Thus, I decided that I would give it a go on my new homeserver. In this guide I’ll go through Kopia’s basic usage on the command line and how to set it up with Backblaze B2. In between I will also mention some of its advanced concepts, such as compression and retention policies.

All of this is nicely documented in the Kopia docs, but if you are interested in an opionated introduction, follow right along.

# Installation

The first step is downloading the kopia binary, for which there are several options available in the Kopia documentation. Type kopia --version into your shell to verify the CLI is installed correctly.

1
2
$ kopia --version
0.10.6 build: 766cb57160477fba0935634e98c2bdfd440557f3 from: kopia/kopia

# Configuration

Next, we need to create a storage bucket on Backblaze B2. We don’t need to enable “Default Encryption”, since we’ll be using client-side encryption with Kopia.

Creating a new bucket in the Backblaze B2 Web UI

And then create a new application key which has access only to the bucket created in the previous step:

Adding a new application key for accessing the Backblaze B2 bucket

Make sure to copy the keyID and applicationKey which are displayed after clicking Create. We’ll need them in the next step to configure Kopia for accessing Backblaze.

Additionally, we’ll also need a “repository password” for Kopia – which is a bit confusing, because this password is not used for authenticating to Backblaze, but instead this is the secret used for encrypting the data on the client-side before sending it to Backblaze (or any other storage backend). You can use your favorite password manager or a tool like pwgen to generate this secret - and make sure to store it somewhere safe! We will need it in case we want to access the Kopia backups from another machine.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ kopia repository create b2 \
  --bucket=YOUR_BUCKET \
  --key-id=YOUR_KEY_ID \
  --key=YOUR_APPLICATION_KEY \
  --password=REPOSITORY_PASSWORD

Initializing repository with:
  block hash:          BLAKE2B-256-128
  encryption:          AES256-GCM-HMAC-SHA256
  splitter:            DYNAMIC-4M-BUZHASH
Connected to repository.

Retention:
  Annual snapshots:     3   (defined for this target)
  Monthly snapshots:   24   (defined for this target)
  Weekly snapshots:     4   (defined for this target)
  Daily snapshots:      7   (defined for this target)
  Hourly snapshots:    48   (defined for this target)
  Latest snapshots:    10   (defined for this target)
Compression disabled.

To find more information about default policy run 'kopia policy get'.
To change the policy use 'kopia policy set' command.

Reference: kopia repository connect b2

There is lots of output here, but for now the most important line is “Connected to repository”.

You should be aware that Kopia stores the credentials used to connect to the storage backend in plaintext in ${HOME}/.config/kopia/repository.config (or $KOPIA_CONFIG_PATH, if set) until you run kopia repository disconnect. This behavior can be disabled all-together with the --persist-credentials=false parameter.

Similarly, the repository password will be stored in ${HOME}/.config/kopia/repository.config.kopia-password. If you want to avoid this, you can set the environment variable $KOPIA_PASSWORD instead of using the --password argument.

Kopia also provides a command to check that the connection to the storage backend works as expected (great, I love self-checks!):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ kopia repository validate-provider
Validating blob list responses
Validating non-existent blob responses
Writing blob (5000000 bytes)
Validating conditional creates...
Validating list responses...
Validating partial reads...
Validating full reads...
Validating metadata...
Running concurrency test for 30s...
All good.
Cleaning up temporary data...

Looks good!

There are two things in the output shown above which we still need to address: the compression policy and the retention policy.

As-per Kopia’s documentation, compression can either be enabled per directory with kopia policy set <path> --compression=zstd or for all directories with kopia policy set --global --compression=zstd. Enabling compression can reduce the required upload bandwidth (potentially making backups faster) and storage requirements (lowering costs), especially if you have a large amount of compressible data, such as text documents, log files or database dumps. However, if your data is already compressed in one form or another (such as videos, music, pictures or compressed archives), this likely won’t make much of a difference and might even make the backups slower due to the additional CPU overhead. For this reason, Kopia allows setting minimum and maximum file sizes for compression as well as including and excluding certain file extensions from compression, see kopia policy set –help.

As a concrete example, these are the compression statistics for my generic 100GiB dataset which has mix of pictures, documents, music and other files (basically $HOME):

1
2
3
4
5
6
7
$ kopia content stats
Count: 116549
Total Bytes: 95.3 GB
Total Packed: 91.4 GB (compression 4.1%)
By Method:
  (uncompressed)         count: 64619 size: 47.9 GB
  zstd                   count: 51930 size: 47.4 GB packed: 43.5 GB compression: 8.2%

Roughly half of the data was left uncompressed by Kopia, because it noticed that the data is not compressible. On the other half Kopia achieved a meager compression ratio of 8.2% with the zstd compression algorithm.

For a detailed discussion about compression including benchmarks check Kopia’s dedicated documentation page.

The retention policy configures how many snapshots Kopia should preserve for this particular repository. The default retention policy (shown above) stores quite a lot of snapshots, I like to go with the following, lighter policy (assuming daily backups):

keep last 30 daily backups
keep last 12 monthly backups

Just like for compression previously, this policy can either be enabled per directory or globally for all directories:

1
2
3
4
5
6
7
$ kopia policy set --global \
    --keep-annual 0 \
    --keep-monthly 0 \
    --keep-weekly 52 \
    --keep-daily 30 \
    --keep-hourly 0 \
    --keep-latest 5

# Backup

Great, at this point we are ready to perform our first backup! We just need to point Kopia to the directory it should take a backup of:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
$ kopia snapshot create /mnt
Snapshotting root@homelab:/mnt ...
 * 0 hashing, 2067 hashed (1.4 GB), 75630 cached (111 GB), uploaded 412 MB, estimated 112.4 GB (100.0%) 0s left
Created snapshot with root k0cf0e6f6e91eab5f0338a71042670d5e and ID 4b709205bb0a78a755e5dc8c469b0586 in 3h1m18s
Running full maintenance...
Looking for active contents...
Looking for unreferenced contents...
Previous content rewrite has not been finalized yet, waiting until the next blob deletion.
Looking for unreferenced blobs...
Deleted total 0 unreferenced blobs (0 B)
Cleaned up 0 logs.
Cleaning up old index blobs which have already been compacted...
Finished full maintenance.

We can also give the snapshot a name with the --description parameter. If we want to exclude this snapshot from automatic deletion by the retention policy configured above, we can use --pin.

Taking the first snapshot usually takes a while, because Kopia needs to upload all the data. Subsequent snapshots will be much faster, because only the changes will need to be uploaded (a.k.a. incremental backups).

You will also notice that Kopia performs some maintenance operations immediately after creating a snapshot (it checks for example if there is unused data in the repository). I personally think this behavior makes sense for most users, unlike e.g. restic where these maintenance operations need to be trigger explicitly by the user. However, in certain circumstances where you want to have more control and predictability (e.g. while scripting), these automatic maintenance operations can be disabled with command-line arguments.

Reference: kopia snapshot create

# Restore

Creating backups is of course only one half of the equation. Most importantly, we also need to be able to restore our data. Kopia allows mounting each individual snapshot as a regular filesystem, which makes recovering a subset of files (or all files) extremely easy. Since it’s a using FUSE for this operation, make sure you have the fusermount tool installed (usually available in the fuse or fuse3 distribution packages) otherwise you might see errors like stat /bin/fusermount: no such file or directory.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# list recent snaphsots to find the relevant ID
$ kopia snapshot list
root@homelab:/mnt
  2022-04-02 10:29:01 CEST kbb2073ede2842ac2594718786d8c95cd 111.6 GB drwxr-xr-x files:77096 dirs:32500 (latest-1,daily-1,weekly-1)

# mount the snapshot (in the background so we can continue to use our shell)
$ mkdir /tmp/kopia-restore
$ kopia mount SNAPSHOT_ID /tmp/kopia-restore &
$ ls /tmp/kopia-restore/
pvc-319f7743-d3d2-482e-bce3-81bae9ca5e23_minio_minio
pvc-8afda0e6-4db1-4d9b-86a0-ef40822e7da6_nextcloud_nextcloud-nextcloud
pvc-67ddb324-95d9-4492-81fc-99c70ee0a87f_ejabberd_ejabberd-data-gdz077lr
pvc-ab9b5053-a997-4e17-8c9b-9393f83d34f7_gitea_data-gitea-0

# hint: use the "archive" option when copying files to preserve their modes and timestamps
$ cp -a /tmp/kopia-restore/my-important-file.txt /home/jack

# don't forget to unmount before leaving your shell
$ fg
Ctrl-C

Reference: kopia mount

Alternatively, if we already know exactly which folder or file we want to retrieve, we can use Kopia’s restore command directly:

1
2
3
4
5
6
$ kopia restore SNAPSHOT_ID/path/to/my/important/file.txt /tmp/kopia-restore/
Restoring to local filesystem (/tmp/kopia-restore) with parallelism=8...
Processed 1 (0 B) of 1 (72.2 MB).
Processed 2 (72.2 MB) of 1 (72.2 MB) 11.8 MB/s (100.0%) remaining 0s.
Processed 2 (72.2 MB) of 1 (72.2 MB) 11.8 MB/s (100.0%) remaining 0s.
Restored 1 files, 1 directories and 0 symbolic links (72.2 MB).

Reference: kopia restore

# Conclusion

After this first experiment with Kopia, I have to say I’m extremely impressed. The CLI tool is very ergonomic (especially when you’re already used to restic’s repository/snapshot/retention concepts) and everything seems very well documented.

I really hope this tool will have a long and healthy career.

Happy backing up!