Recover BTRFS Array After Device Failure
Today, one of the drives in my BTRFS RAID 10 array failed and I am posting how to handle the situation for others, and in case it ever comes up again.
Symptoms
I run non-critical Virtualbox instances over an NFS to my BTRFS RAID array, and I've been doing this for over a year now without any issues. However, I noticed that the instances became unresponsive and incredibly slow. It was so slow, there is no way you would not notice it. At this point I checked on the array with sudo btrfs fi show
and everything came back as fine. I tried running a defragmentation operation to see if that would speed things up and it didn't. It wasn't until I performed a reboot and the server failed to come back online that it became clear what had happened. In future, I recommend unmounting and re-mounting the filesystem if you see it is going slow to check for issues.
Recovery Steps
Firstly, you need to mount the array in "degraded" mode. This basically tells the system to mount the data even though it realizes it is not fully redundant now. By not having mounted normally, this has forced the user (you) to acknowledge that there is a situation and you are now aware of it.
Pick one of the drives in your RAID array and mount it like so:
sudo mount -o degraded /dev/sd[x] /path/to/mount/point
If you are unsure what drive letter to use, you can look up the devices with sudo blkid
which will list the drive letters with TYPE="btrfs"
like in the example below:
/dev/sda: UUID="7c95ba2a-deb8-4163-aa3c-299667bfcb43" UUID_SUB="8d0f5b96-2f93-4afe-b602-c3a8e0497111" TYPE="btrfs" /dev/sdb1: UUID="D1AC-53DB" TYPE="vfat" /dev/sdb2: UUID="69cdbe20-9773-4036-9e84-d6a48faf4c4b" TYPE="swap" /dev/sdb3: UUID="3e81667f-9ed3-417f-816d-d64dd11f2a69" TYPE="ext4" /dev/sdc1: UUID="7c95ba2a-deb8-4163-aa3c-299667bfcb43" UUID_SUB="b9371f1c-33ea-4ceb-9fc8-fe374cf9fc8f" TYPE="btrfs"
Do not list all the devices in the array like so. This will not work.
sudo mount -o degraded -t btrfs \ /dev/sda \ /dev/sdb \ /dev/sdc1 \ /dev/sdd \ /dev/sde \ /dev/sdf \ /raid10
Now we can automatically remove the missing device from the array. Execute the command below in a session management tool such as screen, tmux, or byobu because it may not return for many hours.
sudo btrfs device delete missing /path/to/btrfs/mount
sudo btrfs device add /dev/sd[x] /path/to/mount
. If you don't have any spare slots, I hope you labelled your drives so that you can remove the failed one and put the new one in its place.
missing
as the parameter and let BTRFS takes care of the rest, preventing you from accidentally specifying the wrong drive.
Unfortunately, this will result in a complete rebalance of blocks across all of the remaining drives. This means that you are screwed if any of the remaining drives fails during this time in which they all have to work pretty hard for many hours, making it similar to RAID5.
My CPU was hard at work as shown in the picture below.
Grey lines represent time the CPU was waiting on the drives, whilst green represents calculation time. One thread would always be running at max capacity, whilst two threads were eaten by disk wait time. However, I think there was possibly another issue going on as the output of top showed he Xorg process was at 100% utilization.
To Be Finished
I am guessing that after the rebalance is finished, I should be able to unmount and remount my array without the degraded flag. I've been waiting over 10 hours for the balance to finish so I though I'd release the tutorial and update this bit later.
Update
Eventually, I got an IO error from the device delete command. Running a sudo btrfs fi show
showed that I had managed to remove all the data from device 2, but it was refusing to be "removed" from the array.
Label: none uuid: 58bd01a7-f160-4fea-aed3-c378c2332699
Total devices 6 FS bytes used 6.76TiB
devid 1 size 2.73TiB used 2.50TiB path /dev/sda
devid 2 size 0.00 used 2.50TiB path
devid 3 size 3.64TiB used 2.50TiB path /dev/sdf
devid 4 size 2.73TiB used 2.50TiB path /dev/sdb
devid 5 size 2.73TiB used 2.50TiB path /dev/sdc1
devid 6 size 3.64TiB used 2.50TiB path /dev/sde
I tried unmounting and remounting the array with no success, even in degraded mode. At this point my heart sank and I tried one last ditch attempt of unplugging each drive one by one, and re-plugging it back in, before booting the server up again. This is where it gets wierd. The server booted without complaint and the array was mounted without me having to manually mount it in degraded mode. At this point I decided to test it by unmounting it and mounting it. This failed and I had to mount in degraded mode. I then tried deleting /dev/sdd again which failed. I then tried unmounting the array and running btrfsck
on the array which spat out a load of warnings/errors (see appendix). I then mounted the array which only worked in degraded mode. I am now running a btrfs scrub
which is finding/fixing lots of issues.
scrub status for 58bd01a7-f160-4fea-aed3-c378c2332699
scrub started at Sun Feb 7 21:13:52 2016 and was aborted after 4135 seconds
total bytes scrubbed: 1.57TiB with 238991 errors
error details: verify=2272 csum=236719
corrected errors: 238988, uncorrectable errors: 0, unverified errors: 0
Unfortunately this is where I am now, mid-way through a scrub. I cannot tell whether running this scrub will "fix" the array or allow me to remove /dev/sdd
. Running sudo btrfs scrub status -d /raid10
outputs the following
scrub device /dev/sda (id 1) status
scrub started at Sun Feb 7 21:13:52 2016, running for 4711 seconds
total bytes scrubbed: 457.60GiB with 0 errors
scrub device /dev/sde (id 2) status
scrub started at Sun Feb 7 21:13:52 2016, running for 4711 seconds
total bytes scrubbed: 396.75GiB with 286399 errors
error details: verify=2272 csum=284127
corrected errors: 286399, uncorrectable errors: 0, unverified errors: 0
scrub device /dev/sdg (id 3) status
scrub started at Sun Feb 7 21:13:52 2016, running for 4711 seconds
total bytes scrubbed: 224.18GiB with 0 errors
scrub device /dev/sdb (id 4) status
scrub started at Sun Feb 7 21:13:52 2016, running for 4711 seconds
total bytes scrubbed: 289.20GiB with 0 errors
scrub device /dev/sdd1 (id 5) history
scrub started at Sun Feb 7 21:13:52 2016 and was aborted after 0 seconds
total bytes scrubbed: 0.00 with 0 errors
scrub device /dev/sdf (id 6) status
scrub started at Sun Feb 7 21:13:52 2016, running for 4711 seconds
total bytes scrubbed: 475.63GiB with 0 errors
This shows that there might have also been an issue with /dev/sde
. I'm hoping that /dev/sdd is showing no scrubbing because we successfully moved the data, and not because its just flat failed. I'm not sure if /dev/sdd has actually failed now as after one of the reboots it appears to be operating. I've run SMART scans on all drives and they all appear healthy, except /dev/sdd gets stuck with 10% remaining. At this point the server became unresponsive and I had to perform a hard reboot. When I mounted the array in degraded mode and tried to resume the scrub, the time was not increasing. Cancelling a scrub would fail, saying that one wasn't being run, whilst starting it would say that one was already running. I used the line below from Marc's blog to force a cancellation.
perl -pi -e 's/finished:0/finished:1/' /var/lib/btrfs/*
After which, I was able to start the scrub again. So far it hasn't found any issues.
Debugging
If you do not perform the step to get the array online in degraded mode before trying to remove the failed device you will get an error like
ERROR: error removing the device '/dev/sdb' - Inappropriate ioctl for device
When your in a panic, seeing this message can cause you to need your brown pants. I'm hoping this tutorial will come up for others Googling this error message.
References
- Red Hat Docs - 3.4.5. Removing btrfs devices
- BTRFS Wiki - Using Btrfs with Multiple Devices
- Superuser - Repair btrfs RAID? Error: Inappropriate ioctl for device
Appendix
Output of btrfsck
sudo btrfsck /dev/sdd1
No valid Btrfs found on /dev/sdd1
sudo btrfsck /dev/sde
warning, device 5 is missing
warning devid 5 not found already
Checking filesystem on /dev/sde
UUID: 58bd01a7-f160-4fea-aed3-c378c2332699
checking extents
checking free space cache
Error reading 36824793235456, -1
failed to load free space cache for block group 25965986381824free space inode generation (0) did not match free space cache generation (263494)
free space inode generation (0) did not match free space cache generation (263494)
free space inode generation (0) did not match free space cache generation (263494)
free space inode generation (0) did not match free space cache generation (263494)
free space inode generation (0) did not match free space cache generation (263494)
free space inode generation (0) did not match free space cache generation (263494)
free space inode generation (0) did not match free space cache generation (263494)
free space inode generation (0) did not match free space cache generation (268613)
free space inode generation (0) did not match free space cache generation (268612)
free space inode generation (0) did not match free space cache generation (315822)
free space inode generation (0) did not match free space cache generation (315826)
free space inode generation (0) did not match free space cache generation (271362)
free space inode generation (0) did not match free space cache generation (278248)
free space inode generation (0) did not match free space cache generation (285754)
free space inode generation (0) did not match free space cache generation (271506)
free space inode generation (0) did not match free space cache generation (271362)
free space inode generation (0) did not match free space cache generation (279813)
free space inode generation (0) did not match free space cache generation (285766)
free space inode generation (0) did not match free space cache generation (285697)
free space inode generation (0) did not match free space cache generation (285758)
free space inode generation (0) did not match free space cache generation (285754)
free space inode generation (0) did not match free space cache generation (285697)
free space inode generation (0) did not match free space cache generation (271481)
free space inode generation (0) did not match free space cache generation (279813)
free space inode generation (0) did not match free space cache generation (279813)
free space inode generation (0) did not match free space cache generation (285754)
free space inode generation (0) did not match free space cache generation (285697)
free space inode generation (0) did not match free space cache generation (279813)
free space inode generation (0) did not match free space cache generation (279813)
free space inode generation (0) did not match free space cache generation (279813)
free space inode generation (0) did not match free space cache generation (271475)
free space inode generation (0) did not match free space cache generation (279813)
free space inode generation (0) did not match free space cache generation (284204)
free space inode generation (0) did not match free space cache generation (279814)
free space inode generation (0) did not match free space cache generation (279814)
free space inode generation (0) did not match free space cache generation (286850)
free space inode generation (0) did not match free space cache generation (271475)
free space inode generation (0) did not match free space cache generation (271744)
free space inode generation (0) did not match free space cache generation (315822)
free space inode generation (0) did not match free space cache generation (279558)
free space inode generation (0) did not match free space cache generation (270848)
free space inode generation (0) did not match free space cache generation (315876)
checking fs roots
checking csums
checking root refs
found 5419802658561 bytes used err is 0
total csum bytes: 7247450076
total tree bytes: 8555118592
total fs tree bytes: 241958912
total extent tree bytes: 291864576
btree space waste bytes: 702146162
file data blocks allocated: 19521148178432
referenced 7291670536192
Btrfs v3.12
First published: 16th August 2018