Kolla-Ansible OpenStack 2023.2 released - How to upgrade

Beginning of December Kolla-Ansible released their support for OpenStack 2023. (Bobcat). Most notable changes (to me) are:
- RabbitMQ quorum queues
- These are now enabled by default and I highly recommend using them. Without it you will run into problems if your control node goes down that currently holds the VIP.
- Added log retention in OpenSearch
- Great. This was missing since the migration from Elastic.
- Add Lets Encrypt TLS certificate service integration into Openstack deployment
- Really interesting, worths checking out.
- Implements support for Podman deployment as an alternative to Docker
- I guess for any new deployment the switch to Podman can be a good choice.
- Adds support for copying in
{{ node_custom_config }}/magnum/kubeconfig
to Magnum containers formagnum-cluster-api
driver- I tested this and while it works, it has to be configured as otherwise Magnum will fail to deploy. It is not sufficient to just have an empty config file. And by it works, I don't mean that K8s Cluster API is now working out of the box. It doesn't. :(
- The
etcd
tooling has been updated to handle adding and removing nodes- Great. I once had to replace one of my control nodes and didn't mange to add it to the etc cluster.
- Glance, cinder, manila services now support configuration of multiple ceph cluster backends
- The flag
--check-expiry
has been added to theoctavia-certificates
command- Very helpful.
- The Octavia amphora provider driver improves control plane resiliency
- You need to enable REDIS, since enable_octavia_jobboard is now enabled by default and for this REDIS is required.
- Added support for Cinder-Backup with S3 backend
- Added support for Glance with S3 backend
I have managed to upgrade my 2023.1 (Antelope) deployment. I had some issues with the upgrade to RabbitMQ quorum queues. The docs are not quite clear which services need to be stopped. It simply says all services that use RabbitMQ. To me this means that you will need to have downtime for the upgrade since you need to stop nova, keystone, glance etc. Would be nice if the docs would be more clear here. In my test environment I pretty much followed the official upgrade procedure.
RabbitMQ Quorum Queues
This is how I managed to upgrade to Quorum Queues. Please do note that this is disruptive and you might need to stop additional services.
kolla-ansible -i multinode stop --tags neutron,nova,glance,cinder,keystone,heat --yes-i-really-really-mean-it
ansible -i multinode -m shell -a "docker exec rabbitmq rabbitmqctl stop_app" -b control
ansible -i multinode -m shell -a "docker exec rabbitmq rabbitmqctl force_reset" -b control
ansible -i multinode -m shell -a "docker exec rabbitmq rabbitmqctl start_app" -b control
kolla-ansible -i multinode deploy --tags neutron,nova,glance,cinder,keystone,heat --yes-i-really-really-mean-it
kolla-ansible -i multinode prechecks
Again I'm not 100% sure if you really need to stop all these services but for me it worked just fine. You can see which queues need to be upgraded by logging in on one of the control nodes and execute the following command in the rabbitmq container:
ssh control01
sudo docker exec -it rabbitmq
rabbitmqctl list_queues --silent name type | egrep -v '(fanout|reply)'
Octavia
As mentioned above you will either need to enable REDIS in globals.yml:
enable_redis: "yes"
Or disable job board:
enable_octavia_jobboard: "no"
I highly recommend not to disable jobboard and enable REDIS instead. It will make Octavia more resilient.
Magnum
While there was some progress made to integrate the K8s Cluster API driver, it is not 100% working out of the box yet. You still need to install HELM in the Magnum containers and you need to provide a kubeconfig file even if you are not using CAPI at the moment. Seems to be a bug. At least when I deployed magnum in 2023.2 it kept failing due to the missing kubeconfig file. You can create a small K8s cluster using KIND and copy its kubeconfig.
mkdir -p /etc/kolla/config/magnum
cp ~/config /etc/kolla/config/magnum/kubeconfig
CEPH
There was also a configuration change for the names of the ceph keyring files
Happy upgrading and good luck. :)