* [gentoo-user] A drive in my RAID6 has failed @ 2013-09-05 16:49 Paul Hartman 2013-09-05 16:52 ` Michael Orlitzky 0 siblings, 1 reply; 5+ messages in thread From: Paul Hartman @ 2013-09-05 16:49 UTC (permalink / raw To: Gentoo User Hi, I woke up this morning to see the dreaded email from mdadm telling me one of my drives failed overnight, while I was happily dreaming about cute puppies and kittens installing a rainbow-colored roof on my house. The array is a RAID6 (two parity drives) and this is the current state: md0 : active raid6 sdd1[5] sdg1[4] sde1[3](F) sdh1[2] sdf1[1] sdi1[0] 11720009728 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [UUU_UU] I've been using RAID in Linux for years, but this is actually the first time I've had a disk fail in one. If I remember correctly, the process should be as simple as: #remove the failed disk from the array: mdadm /dev/md0 -r /dev/sde1 #pull the drive, replace with new one, partition it, then add it to the array: mdadm /dev/md0 -a /dev/sde1 and sit back and eat popcorn while I enjoy the blinkenlights for the next several hours/days? :) Any advice/suggestions for managing this process any differently? For now I have unmounted the filesystem that sits atop it, to prevent any more writes from occurring, just in case... Thanks, Paul ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] A drive in my RAID6 has failed 2013-09-05 16:49 [gentoo-user] A drive in my RAID6 has failed Paul Hartman @ 2013-09-05 16:52 ` Michael Orlitzky 2013-09-05 17:11 ` Paul Hartman 0 siblings, 1 reply; 5+ messages in thread From: Michael Orlitzky @ 2013-09-05 16:52 UTC (permalink / raw To: gentoo-user On 09/05/2013 12:49 PM, Paul Hartman wrote: > Hi, > > I woke up this morning to see the dreaded email from mdadm telling me > one of my drives failed overnight, while I was happily dreaming about > cute puppies and kittens installing a rainbow-colored roof on my > house. The array is a RAID6 (two parity drives) and this is the > current state: > > md0 : active raid6 sdd1[5] sdg1[4] sde1[3](F) sdh1[2] sdf1[1] sdi1[0] > 11720009728 blocks super 1.2 level 6, 512k chunk, algorithm 2 > [6/5] [UUU_UU] > > I've been using RAID in Linux for years, but this is actually the > first time I've had a disk fail in one. > > If I remember correctly, the process should be as simple as: > > #remove the failed disk from the array: > mdadm /dev/md0 -r /dev/sde1 > > #pull the drive, replace with new one, partition it, then add it to the array: > mdadm /dev/md0 -a /dev/sde1 > > and sit back and eat popcorn while I enjoy the blinkenlights for the > next several hours/days? :) Any advice/suggestions for managing this > process any differently? > This is the process I always follow: http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array The sfdisk trick will save you a bit of hassle. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] A drive in my RAID6 has failed 2013-09-05 16:52 ` Michael Orlitzky @ 2013-09-05 17:11 ` Paul Hartman 2013-09-06 5:46 ` Paul Hartman 0 siblings, 1 reply; 5+ messages in thread From: Paul Hartman @ 2013-09-05 17:11 UTC (permalink / raw To: Gentoo User On Thu, Sep 5, 2013 at 11:52 AM, Michael Orlitzky <michael@orlitzky.com> wrote: > This is the process I always follow: > > http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array > > The sfdisk trick will save you a bit of hassle. Thanks, it looks like I was on the right path! Crossing my fingers... ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] A drive in my RAID6 has failed 2013-09-05 17:11 ` Paul Hartman @ 2013-09-06 5:46 ` Paul Hartman 2013-09-06 15:40 ` Paul Hartman 0 siblings, 1 reply; 5+ messages in thread From: Paul Hartman @ 2013-09-06 5:46 UTC (permalink / raw To: Gentoo User On Thu, Sep 5, 2013 at 12:11 PM, Paul Hartman <paul.hartman+gentoo@gmail.com> wrote: > On Thu, Sep 5, 2013 at 11:52 AM, Michael Orlitzky <michael@orlitzky.com> wrote: >> This is the process I always follow: >> >> http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array >> >> The sfdisk trick will save you a bit of hassle. > > Thanks, it looks like I was on the right path! Crossing my fingers... So, I probably should not have attempted to do this immediately after eating dinner. My brain was not operating at full speed, and I went ahead and pulled the drive before removing it from the array. Oops! As soon as I pulled the latch to release the drive, I had that "oh no!" moment. Luckily, as it turns out, md (or mdadm? or udev?) was nice enough to automatically remove it for me when the drive ceased to exist. So, I simply inserted and partitioned the new drive, added it to the array and away we go! md0 : active raid6 sde1[6] sdd1[5] sdg1[4] sdh1[2] sdf1[1] sdi1[0] 11720009728 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [UUU_UU] [>....................] recovery = 2.3% (69513216/2930002432) finish=428.7min speed=111206K/sec When I wake up in the morning, I hope there won't be any errors. BTW -- a couple tips I found which speed up RAID building/recovery tremendously (season to taste): echo 32768 > /sys/block/md0/md/stripe_cache_size echo 200000 > /proc/sys/dev/raid/speed_limit_max ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] A drive in my RAID6 has failed 2013-09-06 5:46 ` Paul Hartman @ 2013-09-06 15:40 ` Paul Hartman 0 siblings, 0 replies; 5+ messages in thread From: Paul Hartman @ 2013-09-06 15:40 UTC (permalink / raw To: Gentoo User On Fri, Sep 6, 2013 at 12:46 AM, Paul Hartman <paul.hartman+gentoo@gmail.com> wrote: > So, I simply inserted and partitioned the new drive, added it to the > array and away we go! > > md0 : active raid6 sde1[6] sdd1[5] sdg1[4] sdh1[2] sdf1[1] sdi1[0] > 11720009728 blocks super 1.2 level 6, 512k chunk, algorithm 2 > [6/5] [UUU_UU] > [>....................] recovery = 2.3% (69513216/2930002432) > finish=428.7min speed=111206K/sec > > When I wake up in the morning, I hope there won't be any errors. Success! It took 10 hours to rebuild the drive (speeds near the start of the disk are significantly faster than those near the end of the disk, so early estimates quoted by /proc/mdstat above were overly optimistic): [3720270.120695] md: bind<sde1> [3720270.162933] RAID conf printout: [3720270.162942] --- level:6 rd:6 wd:5 [3720270.162949] disk 0, o:1, dev:sdi1 [3720270.162954] disk 1, o:1, dev:sdf1 [3720270.162958] disk 2, o:1, dev:sdh1 [3720270.162962] disk 3, o:1, dev:sde1 [3720270.162965] disk 4, o:1, dev:sdg1 [3720270.162969] disk 5, o:1, dev:sdd1 [3720270.163060] md: recovery of RAID array md0 [3720270.163067] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [3720270.163071] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. [3720270.163085] md: using 128k window, over a total of 2930002432k. [3756293.459324] md: md0: recovery done. [3756294.797961] RAID conf printout: [3756294.797969] --- level:6 rd:6 wd:6 [3756294.797974] disk 0, o:1, dev:sdi1 [3756294.797979] disk 1, o:1, dev:sdf1 [3756294.797982] disk 2, o:1, dev:sdh1 [3756294.797986] disk 3, o:1, dev:sde1 [3756294.797989] disk 4, o:1, dev:sdg1 [3756294.797992] disk 5, o:1, dev:sdd1 ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-09-06 15:40 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-05 16:49 [gentoo-user] A drive in my RAID6 has failed Paul Hartman 2013-09-05 16:52 ` Michael Orlitzky 2013-09-05 17:11 ` Paul Hartman 2013-09-06 5:46 ` Paul Hartman 2013-09-06 15:40 ` Paul Hartman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox