As soon as you have set up your Kubernetes cluster and deployed a few containers, the question arises how to create backups. A very useful and comfortable tool for this is Velero. It can be easily installed via HELM and you can create backups and restores by a two single commands. Important to note is that Velero creates snapshots of the PVs (persistent volumes) by default and not all storage classes support this. For example the nfs-subdir-external-provisioner or K3S local-path storage classes do not support snapshots. I will be using K3s and the local-path storage class in my tutorial. For this Velero offers the possibility to backup the PVs via RESTIC and Kopia. For more information read here:
You just have to activate it during setup. But more about that later. You can schedule backups, set retention periods and much more. All backups are stored in an S3 bucket. So you can either rent cloud storage or run a small MinIO server locally.
Please note that I'm running this on my local network but you can of course run this in your VPS or Cloud instance. But make sure to use HTTPS with a trusted certificate and some very strong passwords and additional mechanisms to ensure safety and integrity of your backups! Let's start with the S3 storage setup.
Setup and installation of S3 Backup storage
As long as your storage is S3 compatible you can link it to Velero. We have various options here. One of the smallest and simplest would be to run MinIO on some VM. I would not recommend this for production but depending on the size of your containers it might be just fine. In my case I have a small TrueNAS Scale server running. You can check out how I installed it in this Blog post. Here we can easily activate S3 storage. This will also just start a MinIO server by the way. To set this up I just followed the official TrueNAS Scale S3 guide. For this we will of course need a dataset. So we log in on TrueNAS Scale and first create a clean new dataset. For this we go to Storage, select the 3 dots next to our zpool and click on "Add Dataset":
I call mine "minio" and accept all the defaults. Once we have the dataset, we need to activate the S3 service. For this we go to System Settings -> Services. Here we select the pen icon on the right side next to S3:
Here we need to select the dataset that we just created, confirm the warning message, as well as enter an access and secret key. Note both down as we will need them later to log in to the MinIO console and also to configure Velero. To enable HTTPS, which I highly recommend, select the default certificate from the dropdown, select a valid certificate that you imported earlier and enter the IP of your TrueNAS server in the URI field. For this tutorial we will not use HTTPS. Finally we click on save and activate the S3 service. It is also a good idea to tick the "Start Automatically" box.
Now we create a bucket called "velero". For this we browse to the MinIO Console on our TrueNAS server. MinIO runs on port 9000 / 9001. In my case that is:
http://192.168.2.2:9000
We log in using the access and secret key that we just configured in the TrueNAS WebUI.
In the MinIO Console we click on Buckets and Create Bucket
Let's see if we can connect to the S3 bucket. For this we will use awscli
sudo yum -y install awscli
To configure awscli we run
aws configure --profile=truenas
A wizard will now ask for access and secret key. We accept the defaults for Region etc. by just pressing ENTER
[vagrant@rocky8-k3s ~]$ aws configure --profile=truenas
AWS Access Key ID [None]: B9A6AC13381DAC41BAD
AWS Secret Access Key [None]: 42D75A75B3
Default region name [None]:
Default output format [None]:
Now we can list the contents of the S3 bucket by running
aws --profile=truenas --endpoint=http://192.168.2.2:9000 s3 ls s3://velero
Don't be suprised. It is empty. :)
Prerequisites for Velero
Next we need to create a S3 credentials file .credentials-velero with the following content:
cat > .credentials-velero << EOF
[default]
aws_access_key_id = B9A6AC13381DAC41BAD
aws_secret_access_key = <your secret key>
EOF
Velero will use this to authenticate against the S3 bucket. In order to interact with Velero we need to download the Velero binary. It is available for multiple platforms in form of a tar file.
Latest release can be found here: https://github.com/vmware-tanzu/velero/releases/latest
At the time of writing this is v1.95.
wget https://github.com/vmware-tanzu/velero/releases/download/v1.95/velero-v1.95-linux-amd64.tar.gz
Extract it and move the "velero" binary to the /usr/local/bin folder
tar xvzf velero-v1.9.5-linux-amd64.tar.gz
sudo mv velero-v1.9.5-linux-amd64/velero /usr/local/bin/
Velero expects a kubeconfig file with ClusterAdmin privilege. So it is best to do this on a machine where kubectl is already installed and configured. If not please install and configure kubectl. A very simple and convenient way is to use Arkade. In case of K3S we need to set the path to the kubeconfig file
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
Install Velero
Now we are ready to install Velero. For this we add the Velero HELM repo and install the chart:
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
helm repo update
helm install velero vmware-tanzu/velero \
--namespace velero \
--create-namespace \
--set-file credentials.secretContents.cloud=./.credentials-velero \
--set configuration.provider=aws \
--set configuration.backupStorageLocation.name=default \
--set configuration.backupStorageLocation.bucket=velero \
--set configuration.backupStorageLocation.config.region=None \
--set configuration.backupStorageLocation.config.s3ForcePathStyle=true \
--set configuration.backupStorageLocation.config.s3Url=http://192.168.2.2:9000 \
--set configuration.defaultVolumesToFsBackup=true \
--set snapshotsEnabled=false \
--set deployNodeAgent=true \
--set initContainers[0].name=velero-plugin-for-aws \
--set initContainers[0].image=velero/velero-plugin-for-aws:latest \
--set initContainers[0].imagePullPolicy=IfNotPresent \
--set initContainers[0].volumeMounts[0].mountPath=/target \
--set initContainers[0].volumeMounts[0].name=plugins
Now monitor the pod deployment
[vagrant@rocky8-k3s ~]$ kubectl get pods -n velero
NAME READY STATUS RESTARTS AGE
node-agent-tvssj 1/1 Running 0 50s
velero-576fb78ffb-747p9 1/1 Running 0 50s
If for whatever reason the pods are not starting or working as expected, check the logs
kubectl logs deployment/velero -n velero
If everything is up and running, verify your client and server version match
[vagrant@rocky8-k3s ~]$ velero version
Client:
Version: v1.9.5
Git commit: 2b5281f38aad2527f95b55644b20fb169a6702a7
Server:
Version: v1.10.0
# WARNING: the client version does not match the server version. Please update client
In my case the client version didn't match. For some reason the Github latest link is not pointing to the latest release. So I quickly installed the v1.10.0 version
wget https://github.com/vmware-tanzu/velero/releases/download/v1.10.0/velero-v1.10.0-linux-amd64.tar.gz
tar xvzf velero-v1.10.0-linux-amd64.tar.gz
sudo mv velero-v1.10.0-linux-amd64/velero /usr/local/bin/
And now it is all fine
[vagrant@rocky8-k3s ~]$ velero version
Client:
Version: v1.10.0
Git commit: 367f563072659f0bcd809bc33507fd75cd722344
Server:
Version: v1.10.0
To see the Velero configuration we can simply use HELM
helm get values -n velero velero
To see what was actually deployed we can run
kubectl get all -n velero
Example
[vagrant@rocky8-k3s ~]$ kubectl get all -n velero
NAME READY STATUS RESTARTS AGE
pod/node-agent-tvssj 1/1 Running 0 12m
pod/velero-576fb78ffb-747p9 1/1 Running 0 12m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/velero ClusterIP 10.43.179.220 <none> 8085/TCP 12m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/node-agent 1 1 1 1 1 <none> 12m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/velero 1/1 1 1 12m
NAME DESIRED CURRENT READY AGE
replicaset.apps/velero-576fb78ffb 1 1 1 12m
Now with Velero completely installed, we can start by running our first backup job
Run a backup job
We can either run a backup job to backup all namespaces
velero backup create all --wait
or to backup an individual namespace
velero backup create ghost --include-namespaces ghost --wait
by pressing "CTRL-C" or omitting the "--wait", the job will run in the background.
[vagrant@rocky8-k3s ~]$ velero backup create ghost --include-namespaces ghost --wait
Backup request "ghost" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup will continue in the background.
..
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe ghost` and `velero backup logs ghost`.
[vagrant@rocky8-k3s ~]$ velero backup create all --wait
Backup request "all" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup will continue in the background.
........
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe all` and `velero backup logs all`.
Once the jobs are completed, we can run additional commands to get more details.
velero backup describe all
To get even more details
velero backup describe all --details
This can be very helpful in case a job fails or shows a warning. In my case both jobs were successful.
Make sure to check the output for lines like:
v1/PersistentVolume:
- pvc-7d4d2a56-e6b2-4fb4-a373-8116de67cae3
- pvc-95eb084e-6ddf-43d5-a22b-4ade78e0fd88
- pvc-c9e5facc-d414-4264-9d9c-7834a9446323
v1/PersistentVolumeClaim:
- ghost/data-roksblog-mysql-0
- ghost/roksblog-ghost
- plausible/data-plausible-analytics-postgresql-0
This tells us that the persistent volumes and claims have been successfully backed up.
To list all backups run
velero backup get
Example
[vagrant@rocky8-k3s ~]$ velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
all Completed 0 0 2022-12-27 22:17:56 +0000 UTC 29d default <none>
ghost Completed 0 0 2022-12-27 22:17:44 +0000 UTC 29d default <none>
By default a retention policy of 30 days is applied on each backup. This means that after 30 days the backup will be automatically deleted! You can adjust this by adding the parameter "-ttl" to the backup command. The folllowing would store the backups for one week:
velero backup create ghost-ttl7d --include-namespaces ghost --wait --ttl 168h0m0s
If we now list the contents of our S3 bucket, we will see this:
[vagrant@rocky8-k3s ~]$ aws --profile=truenas --endpoint=http://192.168.2.2:9000 s3 ls s3://velero
PRE backups/
And in the MinIO console
Now that we have successfully taken a backup, let's destroy our Ghost deployment and restore it.
Restore a backup
First we uninstall our Ghost HELM chart
[vagrant@rocky8-k3s ~]$ helm list -n ghost
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /etc/rancher/k3s/k3s.yaml
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /etc/rancher/k3s/k3s.yaml
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
roksblog ghost 8 2022-12-27 09:57:45.120724689 +0000 UTC deployed ghost-19.1.52 5.26.3
[vagrant@rocky8-k3s ~]$ helm uninstall roksblog -n ghost
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /etc/rancher/k3s/k3s.yaml
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /etc/rancher/k3s/k3s.yaml
release "roksblog" uninstalled
We will also delete the namespace
[vagrant@rocky8-k3s ~]$ kubectl delete ns ghost
namespace "ghost" deleted
Now we start the restore
velero restore create --from-backup ghost
To check the status of the restore we run
velero restore describe ghost-20221227223404
Example
[vagrant@rocky8-k3s ~]$ velero restore describe ghost-20221227223404
Name: ghost-20221227223404
Namespace: velero
Labels: <none>
Annotations: <none>
Phase: Completed
Total items to be restored: 43
Items restored: 43
Started: 2022-12-27 22:34:04 +0000 UTC
Completed: 2022-12-27 22:34:05 +0000 UTC
Backup: ghost
Namespaces:
Included: all namespaces found in the backup
Excluded: <none>
Resources:
Included: *
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io, csinodes.storage.k8s.io, volumeattachments.storage.k8s.io, backuprepositories.velero.io
Cluster-scoped: auto
Namespace mappings: <none>
Label selector: <none>
Restore PVs: auto
Existing Resource Policy: <none>
Preserve Service NodePorts: auto
As we can see it is restoring the pods
kubectl get pods -n ghost rocky8-k3s.fritz.box: Tue Dec 27 22:57:08 2022
NAME READY STATUS RESTARTS AGE
roksblog-ghost-686bc9d555-9zb29 0/1 Running 0 11s
roksblog-mysql-0 0/1 Running 0 11s
Also HELM is listing our chart as installed again
[vagrant@rocky8-k3s ~]$ helm list -n ghost
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /etc/rancher/k3s/k3s.yaml
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /etc/rancher/k3s/k3s.yaml
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
roksblog ghost 1 2022-12-27 22:55:02.709848568 +0000 UTC deployed ghost-19.1.52 5.26.3
And we can browse it just fine
Schedule Backup jobs
We can easily create multiple cron style backup schedules by running:
# Daily Backups. Run the daily backup every day at 09:30. Please do note that the system runs on UTC time!!!
velero schedule create daily --schedule="30 09 * * *"
# Weekly Backups. Run the following backup job every Sunday at 10:30. Please do note that the system runs on UTC time!!!
velero schedule create weekly --schedule="30 10 * * 0" --include-cluster-resources=true
# Monthly Backups. Run the following backup job every 1st of the month at 20:30. Please do note that the system runs on UTC time!!! Keep the backups for 1 year.
velero schedule create monthly --schedule="30 22 1 * *" --include-cluster-resources=true --ttl 8064h0m0s
I recomend to take daily, weekly and monthly backups with individual retention policies. It is also advisable to backup all cluster resources at least once a week for DR purposes. In case you need some help for the cron schedules, check out crontab.guru.
And of course we can also lookup our schedules:
velero get schedule
Example
[vagrant@rocky8-k3s ~]$ velero get schedule
NAME STATUS CREATED SCHEDULE BACKUP TTL LAST BACKUP SELECTOR PAUSED
daily Enabled 2022-12-28 09:05:40 +0000 UTC 30 09 * * * 0s n/a <none> false
weekly Enabled 2022-12-28 09:05:40 +0000 UTC 30 10 * * 0 0s n/a <none> false
monthly Enabled 2022-12-28 09:22:02 +0000 UTC 30 22 1 * * 8064h0m0s n/a <none> false
This shows that Velero is very simple to install, configure and use. And of course you can also use this for disaster recovery. Your K8s cluster is broken? No problem. Spin up a new cluster, just follow the very same steps to install Velero, point to your S3 storage and restore your backups. I will test this in one of my next Blogposts. So stay tuned.