I've Created a Monster

I've Created a Monster

This monster is called OpenStack. Yes, my beloved OpenStack cluster has now been running for over two years, and throughout the years, I have faced many obstacles. Some of these challenges are still not 100% resolved even today. But I keep learning more and more.

Octavia Certificate Expiration

Just recently, we faced some issues with Octavia, and it turns out that the client certificates generated by Kolla-Ansible are only valid for one year. An easy fix—generate new ones.

. /etc/kolla/admin-openrc.sh
date="$(date +"%m-%d-%y-%H-%M")"
mv /etc/kolla/octavia-certificates /etc/kolla/backup/octavia-certificates-${date}
kolla-ansible octavia-certificates

Next, we need to failover all load balancers. This will trigger the creation of a new Amphora VM with a fresh certificate.

LB_IDS=$(openstack loadbalancer amphora list --status ACTIVE -c loadbalancer_id -f value | sort -u)
for LB_ID in ${LB_IDS}; do
    openstack loadbalancer failover ${LB_ID}
    sleep 100
    openstack loadbalancer amphora list --loadbalancer ${LB_ID}
done

Go and grab a few coffees. This can take a while if you have plenty of LBs.

Magnum and kind Issues

The next big problem was Magnum. I noticed pods in the kube-system namespace failing and constantly restarting. A little while ago, I had deployed the Vexxhost CAPI driver. While this driver is great, it introduced some unfortunate consequences.

I used a KinD cluster as a master cluster, but only to kickstart a bigger cluster in OpenStack that I would later turn into the master cluster. This setup worked fine for some time. However, at some point, I needed to resize the nodes of the master cluster. But how? Since the cluster was created by KinD, I had to figure out how to do this with CAPI.

I thought, why not just resize the cluster nodes one by one? This had worked fine in other clusters before. This time, it went well for two out of three control nodes. The third one failed. Somehow, KinD wasn’t so kind in fixing this automatically, and I was left with a somewhat broken master cluster.

I had to focus on something else, so for a couple of days, the master cluster remained in this state. Then, I started seeing more and more pods restarting. All clusters were still working, but they became somewhat unresponsive at times.

It turned out that the certificates in the KinD cluster had also expired and needed to be renewed.

kubeadm certs check-expiration
kubeadm certs renew all
docker restart kind

Once I had done that, the kickstarted master cluster began healing itself automatically. Really cool. 😄 However, it is still shown as failed when running clusterctl. Something I will (have to) look into next.


OpenStack continues to be a beast—one that I created, and one that I continue to tame. The journey is far from over, and I’m sure there will be more unexpected challenges ahead. But that’s what makes it exciting!

Sources

OpenStack: Octavia LoadBalancer (LBaaS) | panticz.de
Privacy Policy Cookie Policy Terms and Conditions