Managing a Ceph cluster can be a complex yet rewarding experience. Recently, I encountered an issue with OSD (Object Storage Daemon) backfillfull, which required immediate attention to ensure the smooth operation of my storage infrastructure. In this blog post, I'll share the steps I took to resolve the OSD backfillfull issue and optimize the performance of my Ceph cluster.
[root@bootstrap]# sudo ceph -s
cluster:
id: 5b818fec-0b08-11ec-9007-005056b783e1
health: HEALTH_WARN
1 backfillfull osd(s)
15 pool(s) backfillfull
services:
mon: 5 daemons, quorum node02,bootstrap,node01,node03,node05 (age 17h)
mgr: node03.pvrgzt(active, since 3w), standbys: node05.bgupnv, bootstrap.bexjaj
osd: 12 osds: 12 up (since 8w), 12 in (since 2M)
rgw: 2 daemons active (2 hosts, 1 zones)
data:
pools: 15 pools, 593 pgs
objects: 637.67k objects, 2.8 TiB
usage: 8.3 TiB used, 2.2 TiB / 10 TiB avail
pgs: 592 active+clean
1 active+clean+scrubbing+deep
io:
client: 252 KiB/s rd, 4.6 MiB/s wr, 209 op/s rd, 576 op/s wr
Understanding the Issue
Ceph clusters rely on OSDs to store data. Each OSD is responsible for storing data, handling data replication, recovery, and rebalancing. When an OSD reaches its capacity limit, it triggers backfill operations to redistribute data across the cluster. This process can lead to performance degradation if not managed properly.
In my case, I received a warning indicating that one of my OSDs had reached the backfillfull
threshold. This meant the OSD was nearly full and could no longer accept new data until the backfill process was complete. It was crucial to address this issue promptly to prevent further complications.
Step 1: Identifying the Full OSD
The first step was to identify which OSD had reached the backfillfull
state. Using the Ceph command-line interface (CLI), I ran the following command:
sudo ceph osd df
This command provided a detailed report on the disk usage of each OSD in my cluster. I was able to identify the specific OSD that was full and needed immediate attention.
Step 2: Increasing the Backfill Ratio
As a temporary measure, I decided to increase the backfill ratio to allow the cluster to continue rebalancing. This would help alleviate the immediate pressure on the full OSD. I executed the following command:
sudo ceph osd set-backfillfull-ratio 0.95
This command set the backfill ratio to 95%, giving the cluster some breathing room to continue operations while I worked on a more permanent solution.
Step 3: Rebalancing the Cluster
Next, I needed to ensure that the cluster was rebalancing correctly. I checked the status of the cluster using:
sudo ceph status
This command provided an overview of the cluster's health and any ongoing operations. I monitored the rebalancing process to ensure it was progressing smoothly.
Step 4: Cleaning Up Old Snapshots
I also discovered that my OpenStack images and volumes pool had accumulated numerous old rbd's, contributing to the high disk usage. To clean these up, I created a list of all OpenStack image ids
openstack image list -f value -c ID > openstack_image_ids.txt
Next I created a list of all rbd's from the images pool
rbd ls images > openstack_rbds.txt
Now we combine those two lists, sort and filter for uniqueness
cat openstack_image_ids.txt openstack_rbds.txt > all.txt
cat all.txt | sort | unique > unique.txt
The file unique.txt should now only contain rbd's that do not exist in my OpenStack cluster. Now we can loop through that file and remove all orphaned rbd's. But before I do that I export the rbd's just to be on the safe side. For this I quickly added an NFS share to my machine to store them.
for rbd in $(cat unique.txt); do echo "rbd export images/$rbd - | gzip > $rbd.img.gz"; rbd export images/$rbd - | gzip > $rbd.img.gz; done
After some time the rbd's were exported and it was time to get rid of them:
for rbd in $(cat unique.txt); do echo "rbd rm images/$rbd"; rbd rm images/$rbd; done
This command removed all orphaned rbd's, freeing up valuable disk space.
Final step: Adding More Capacity
To prevent the issue from recurring, I decided to add more OSDs to increase the overall capacity of the cluster. Adding new OSDs would help distribute the data more evenly and reduce the likelihood of hitting the backfillfull
threshold in the future.
I'm using cephadm and so all I needed to do was adding the new disks to my servers. They were automatically initialized and added to the cluster. To speed up the process I ran:
sudo ceph orch device ls --refresh
This is the default behaviour. It can be disabled by running:
sudo ceph orch apply osd --all-available-devices --unmanaged=true
After adding all disk I checked their status.
sudo ceph osd status
Finally the cluster status is healthy again.