Cleaning Up My Ceph Cluster with OSD Backfillfull: A Journey to Optimal Performance
Managing a Ceph cluster can be a complex yet rewarding experience. Recently, I encountered an issue with OSD (Object Storage Daemon) backfillfull, which required immediate attention to ensure the smooth operation of my storage infrastructure. In this blog post, I'll share the steps I took to resolve the OSD backfillfull issue and optimize the performance of my Ceph cluster.
[root@bootstrap]# sudo ceph -s
cluster:
id: 5b818fec-0b08-11ec-9007-005056b783e1
health: HEALTH_WARN
1 backfillfull osd(s)
15 pool(s) backfillfull
services:
mon: 5 daemons, quorum node02,bootstrap,node01,node03,node05 (age 17h)
mgr: node03.pvrgzt(active, since 3w), standbys: node05.bgupnv, bootstrap.bexjaj
osd: 12 osds: 12 up (since 8w), 12 in (since 2M)
rgw: 2 daemons active (2 hosts, 1 zones)
data:
pools: 15 pools, 593 pgs
objects: 637.67k objects, 2.8 TiB
usage: 8.3 TiB used, 2.2 TiB / 10 TiB avail
pgs: 592 active+clean
1 active+clean+scrubbing+deep
io:
client: 252 KiB/s rd, 4.6 MiB/s wr, 209 op/s rd, 576 op/s wrUnderstanding the Issue
Ceph clusters rely on OSDs to store data. Each OSD is responsible for storing data, handling data replication, recovery, and rebalancing. When an OSD reaches its capacity limit, it triggers backfill operations to redistribute data across the cluster. This process can lead to performance degradation if not managed properly.
In my case, I received a warning indicating that one of my OSDs had reached the backfillfull threshold. This meant the OSD was nearly full and could no longer accept new data until the backfill process was complete. It was crucial to address this issue promptly to prevent further complications.
Step 1: Identifying the Full OSD
The first step was to identify which OSD had reached the backfillfull state. Using the Ceph command-line interface (CLI), I ran the following command:
sudo ceph osd dfThis command provided a detailed report on the disk usage of each OSD in my cluster. I was able to identify the specific OSD that was full and needed immediate attention.
Step 2: Increasing the Backfill Ratio
As a temporary measure, I decided to increase the backfill ratio to allow the cluster to continue rebalancing. This would help alleviate the immediate pressure on the full OSD. I executed the following command:
sudo ceph osd set-backfillfull-ratio 0.95This command set the backfill ratio to 95%, giving the cluster some breathing room to continue operations while I worked on a more permanent solution.
Step 3: Rebalancing the Cluster
Next, I needed to ensure that the cluster was rebalancing correctly. I checked the status of the cluster using:
sudo ceph statusThis command provided an overview of the cluster's health and any ongoing operations. I monitored the rebalancing process to ensure it was progressing smoothly.
Step 4: Cleaning Up Old Snapshots
I also discovered that my OpenStack images and volumes pool had accumulated numerous old rbd's, contributing to the high disk usage. To clean these up, I created a list of all OpenStack image ids
openstack image list -f value -c ID > openstack_image_ids.txtNext I created a list of all rbd's from the images pool
rbd ls images > openstack_rbds.txtNow we combine those two lists, sort and filter for uniqueness
cat openstack_image_ids.txt openstack_rbds.txt > all.txt
cat all.txt | sort | unique > unique.txtThe file unique.txt should now only contain rbd's that do not exist in my OpenStack cluster. Now we can loop through that file and remove all orphaned rbd's. But before I do that I export the rbd's just to be on the safe side. For this I quickly added an NFS share to my machine to store them.
for rbd in $(cat unique.txt); do echo "rbd export images/$rbd - | gzip > $rbd.img.gz"; rbd export images/$rbd - | gzip > $rbd.img.gz; doneAfter some time the rbd's were exported and it was time to get rid of them:
for rbd in $(cat unique.txt); do echo "rbd rm images/$rbd"; rbd rm images/$rbd; doneThis command removed all orphaned rbd's, freeing up valuable disk space.
Final step: Adding More Capacity
To prevent the issue from recurring, I decided to add more OSDs to increase the overall capacity of the cluster. Adding new OSDs would help distribute the data more evenly and reduce the likelihood of hitting the backfillfull threshold in the future.
I'm using cephadm and so all I needed to do was adding the new disks to my servers. They were automatically initialized and added to the cluster. To speed up the process I ran:
sudo ceph orch device ls --refreshThis is the default behaviour. It can be disabled by running:
sudo ceph orch apply osd --all-available-devices --unmanaged=trueAfter adding all disk I checked their status.
sudo ceph osd statusFinally the cluster status is healthy again.