Replacing failed disk in Linux RAID1

In this past article I’ve written about setting a sofware based RAID1 on Linux.   After less then two years one disks failed,  so I had to replace it – here are my experiences with this procedure.


After forced reboot (power off), server did not boot –  rebooting manually I’ve found that problem is with raid –  system complains about raid disk failure and offers to start with raid1  in degraded mode.  If this was not chosen it opened maintenance shell.

Identifying failed disk

Boot with degraded raid1. Check current raid status

Clearly second disk is missing form raid1,  next you should check what’s wrong with the  second disk:

  • check logs and linux kernel messages – like dmesg | grep sdc,
  • check SMART  status of the disk: sudo smartctl -a /dev/sdc (also note down disk serial for later)
    check attributes like “Raw_Read_Error_Rate”,  or “Offline_Uncorrectable” for non zero values
  • optionally run test on the disk  sudo smartctl --test=long /dev/hdc (takes several hours)

Remove failed disk

After boot with degraded raid1 problematic disk should be already removed from raid group. Check it with:

If problematic disk is still in raid group you have to remove disk from group:

Then the disk could be physically removed from server (double check disk serial, that you are removing correct disk).

Install replacement disk

Install new disk and start with degraded raid1.   Create same partitions on new disk (install gdisk package):

After you can also change new partition name  (with  parted).
And add new partion to raid1:

And check cat /proc/mdstat  that raid1 is synchronizing:

or mdadmin --detail:

After synchronization everything should look normal again:




