Update:
Looks like the problem has now also been detected by the kolla-ansible team. There is a bug report https://bugs.launchpad.net/zun/+bug/2007142 and they have updated the ZUN kolla-ansible documentation https://docs.openstack.org/kolla-ansible/yoga/reference/compute/zun-guide.html with a solution.The idea is to pin the docker version to 20.x in the kolla-ansible configuration by setting the following two parameters:
docker_apt_package_pin: "5:20.*"
docker_yum_package_pin: "20.*"
After this you need to run bootstrap and deploy. Be aware that the bootstrap will restart all running containers. So maybe it is a good idea to bootstrap your control and compute nodes one by one.
kolla-ansible bootstrap-servers
kolla-ansible deploy
This is just a very short post. While deploying Zun in my Kolla-Ansible OpenStack staging cluster I came across a problem. Running "kolla-ansible deploy" failed. A quick investigation shows that the docker service on both compute nodes fails to start. The reason for this is that Zun adds a configuration option of "cluster-store" in /etc/docker/daemon.json
[root@compute01 ~]# cat /etc/docker/daemon.json
{
"bridge": "none",
"cluster-store": "etcd://192.168.20.141:2379,192.168.20.142:2379,192.168.20.143:2379",
"insecure-registries": [
"172.28.7.140:4000"
],
"ip-forward": false,
"iptables": false,
"log-opts": {
"max-file": "5",
"max-size": "50m"
}
}
While in Docker versions < 23.x this was just deprecated, in newer versions like 23.x this is removed and so docker fails to start. I figured this out by running dockerd:
sudo dockerd
docker Host-discovery and overlay networks with external k/v stores are deprecated. The 'cluster-advertise', 'cluster-store', and 'cluster-store-opt' options have been removed
More infos on deprecated features in Docker can be found here:
In my staging cluster I was able to easily fix this by simply downgrading Docker on all my nodes. On a production cluster you should do this node by node and migrate all instances of the compute nodes beforehand! I also recommend to versionlock these packages for now. For this we can simply install versionlock.
ansible -i multinode all -m shell -a "dnf install -y python3-dnf-plugin-versionlock; dnf -y remove docker-buildx-plugin; dnf -y install docker-ce-20.10.23-3.el8.x86_64 docker-ce-cli-20.10.23-3.el8.x86_64 docker-ce-rootless-extras-20.10.23-3.el8.x86_64;dnf versionlock docker-ce-20.10.23-3.el8.x86_64 docker-ce-cli-20.10.23-3.el8.x86_64 docker-ce-rootless-extras-20.10.23-3.el8.x86_64 containerd.io docker-compose-plugin" -b
Due to the disruptive downgrade, some containers on my nodes would not come up healthy. I fixed it by running:
kolla-ansible -i multinode mariadb_recovery
I have also applied this change to my Kolla-Ansible Terraform manifest.