Update:

Looks like the problem has now also been detected by the kolla-ansible team. There is a bug report https://bugs.launchpad.net/zun/+bug/2007142 and they have updated the ZUN kolla-ansible documentation https://docs.openstack.org/kolla-ansible/yoga/reference/compute/zun-guide.html with a solution.The idea is to pin the docker version to 20.x in the kolla-ansible configuration by setting the following two parameters:

docker_apt_package_pin: "5:20.*"
docker_yum_package_pin: "20.*"

After this you need to run bootstrap and deploy. Be aware that the bootstrap will restart all running containers. So maybe it is a good idea to bootstrap your control and compute nodes one by one.

 kolla-ansible bootstrap-servers
 kolla-ansible deploy

This is just a very short post. While deploying Zun in my Kolla-Ansible OpenStack staging cluster I came across a problem. Running "kolla-ansible deploy" failed. A quick investigation shows that the docker service on both compute nodes fails to start. The reason for this is that Zun adds a configuration option of "cluster-store" in /etc/docker/daemon.json

[root@compute01 ~]# cat /etc/docker/daemon.json
{
    "bridge": "none",
    "cluster-store": "etcd://192.168.20.141:2379,192.168.20.142:2379,192.168.20.143:2379",
    "insecure-registries": [
        "172.28.7.140:4000"
    ],
    "ip-forward": false,
    "iptables": false,
    "log-opts": {
        "max-file": "5",
        "max-size": "50m"
    }
}

While in Docker versions < 23.x this was just deprecated, in newer versions like 23.x this is removed and so docker fails to start. I figured this out by running dockerd:

sudo dockerd
docker Host-discovery and overlay networks with external k/v stores are deprecated. The 'cluster-advertise', 'cluster-store', and 'cluster-store-opt' options have been removed

More infos on deprecated features in Docker can be found here:

Deprecated Engine Features
Deprecated Features.

In my staging cluster I was able to easily fix this by simply downgrading Docker on all my nodes. On a production cluster you should do this node by node and migrate all instances of the compute nodes beforehand! I also recommend to versionlock these packages for now. For this we can simply install versionlock.

ansible -i multinode all -m shell -a "dnf install -y python3-dnf-plugin-versionlock; dnf -y remove docker-buildx-plugin; dnf -y install docker-ce-20.10.23-3.el8.x86_64 docker-ce-cli-20.10.23-3.el8.x86_64 docker-ce-rootless-extras-20.10.23-3.el8.x86_64;dnf versionlock docker-ce-20.10.23-3.el8.x86_64 docker-ce-cli-20.10.23-3.el8.x86_64 docker-ce-rootless-extras-20.10.23-3.el8.x86_64 containerd.io docker-compose-plugin" -b

Due to the disruptive downgrade, some containers on my nodes would not come up healthy. I fixed it by running:

kolla-ansible -i multinode mariadb_recovery

I have also applied this change to my Kolla-Ansible Terraform manifest.