public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] A drive in my RAID6 has failed
@ 2013-09-05 16:49 Paul Hartman
  2013-09-05 16:52 ` Michael Orlitzky
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Hartman @ 2013-09-05 16:49 UTC (permalink / raw
  To: Gentoo User

Hi,

I woke up this morning to see the dreaded email from mdadm telling me
one of my drives failed overnight, while I was happily dreaming about
cute puppies and kittens installing a rainbow-colored roof on my
house. The array is a RAID6 (two parity drives) and this is the
current state:

md0 : active raid6 sdd1[5] sdg1[4] sde1[3](F) sdh1[2] sdf1[1] sdi1[0]
      11720009728 blocks super 1.2 level 6, 512k chunk, algorithm 2
[6/5] [UUU_UU]

I've been using RAID in Linux for years, but this is actually the
first time I've had a disk fail in one.

If I remember correctly, the process should be as simple as:

#remove the failed disk from the array:
mdadm /dev/md0 -r /dev/sde1

#pull the drive, replace with new one, partition it, then add it to the array:
mdadm /dev/md0 -a /dev/sde1

and sit back and eat popcorn while I enjoy the blinkenlights for the
next several hours/days? :) Any advice/suggestions for managing this
process any differently?

For now I have unmounted the filesystem that sits atop it, to prevent
any more writes from occurring, just in case...

Thanks,
Paul


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-user] A drive in my RAID6 has failed
  2013-09-05 16:49 [gentoo-user] A drive in my RAID6 has failed Paul Hartman
@ 2013-09-05 16:52 ` Michael Orlitzky
  2013-09-05 17:11   ` Paul Hartman
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Orlitzky @ 2013-09-05 16:52 UTC (permalink / raw
  To: gentoo-user

On 09/05/2013 12:49 PM, Paul Hartman wrote:
> Hi,
> 
> I woke up this morning to see the dreaded email from mdadm telling me
> one of my drives failed overnight, while I was happily dreaming about
> cute puppies and kittens installing a rainbow-colored roof on my
> house. The array is a RAID6 (two parity drives) and this is the
> current state:
> 
> md0 : active raid6 sdd1[5] sdg1[4] sde1[3](F) sdh1[2] sdf1[1] sdi1[0]
>       11720009728 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [6/5] [UUU_UU]
> 
> I've been using RAID in Linux for years, but this is actually the
> first time I've had a disk fail in one.
> 
> If I remember correctly, the process should be as simple as:
> 
> #remove the failed disk from the array:
> mdadm /dev/md0 -r /dev/sde1
> 
> #pull the drive, replace with new one, partition it, then add it to the array:
> mdadm /dev/md0 -a /dev/sde1
> 
> and sit back and eat popcorn while I enjoy the blinkenlights for the
> next several hours/days? :) Any advice/suggestions for managing this
> process any differently?
> 

This is the process I always follow:

  http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array

The sfdisk trick will save you a bit of hassle.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-user] A drive in my RAID6 has failed
  2013-09-05 16:52 ` Michael Orlitzky
@ 2013-09-05 17:11   ` Paul Hartman
  2013-09-06  5:46     ` Paul Hartman
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Hartman @ 2013-09-05 17:11 UTC (permalink / raw
  To: Gentoo User

On Thu, Sep 5, 2013 at 11:52 AM, Michael Orlitzky <michael@orlitzky.com> wrote:
> This is the process I always follow:
>
>   http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array
>
> The sfdisk trick will save you a bit of hassle.

Thanks, it looks like I was on the right path! Crossing my fingers...


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-user] A drive in my RAID6 has failed
  2013-09-05 17:11   ` Paul Hartman
@ 2013-09-06  5:46     ` Paul Hartman
  2013-09-06 15:40       ` Paul Hartman
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Hartman @ 2013-09-06  5:46 UTC (permalink / raw
  To: Gentoo User

On Thu, Sep 5, 2013 at 12:11 PM, Paul Hartman
<paul.hartman+gentoo@gmail.com> wrote:
> On Thu, Sep 5, 2013 at 11:52 AM, Michael Orlitzky <michael@orlitzky.com> wrote:
>> This is the process I always follow:
>>
>>   http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array
>>
>> The sfdisk trick will save you a bit of hassle.
>
> Thanks, it looks like I was on the right path! Crossing my fingers...

So, I probably should not have attempted to do this immediately after
eating dinner. My brain was not operating at full speed, and I went
ahead and pulled the drive before removing it from the array. Oops! As
soon as I pulled the latch to release the drive, I had that "oh no!"
moment. Luckily, as it turns out, md (or mdadm? or udev?) was nice
enough to automatically remove it for me when the drive ceased to
exist.

So, I simply inserted and partitioned the new drive, added it to the
array and away we go!

md0 : active raid6 sde1[6] sdd1[5] sdg1[4] sdh1[2] sdf1[1] sdi1[0]
      11720009728 blocks super 1.2 level 6, 512k chunk, algorithm 2
[6/5] [UUU_UU]
      [>....................]  recovery =  2.3% (69513216/2930002432)
finish=428.7min speed=111206K/sec

When I wake up in the morning, I hope there won't be any errors.


BTW -- a couple tips I found which speed up RAID building/recovery
tremendously (season to taste):

echo 32768 > /sys/block/md0/md/stripe_cache_size
echo 200000 > /proc/sys/dev/raid/speed_limit_max


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-user] A drive in my RAID6 has failed
  2013-09-06  5:46     ` Paul Hartman
@ 2013-09-06 15:40       ` Paul Hartman
  0 siblings, 0 replies; 5+ messages in thread
From: Paul Hartman @ 2013-09-06 15:40 UTC (permalink / raw
  To: Gentoo User

On Fri, Sep 6, 2013 at 12:46 AM, Paul Hartman
<paul.hartman+gentoo@gmail.com> wrote:
> So, I simply inserted and partitioned the new drive, added it to the
> array and away we go!
>
> md0 : active raid6 sde1[6] sdd1[5] sdg1[4] sdh1[2] sdf1[1] sdi1[0]
>       11720009728 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [6/5] [UUU_UU]
>       [>....................]  recovery =  2.3% (69513216/2930002432)
> finish=428.7min speed=111206K/sec
>
> When I wake up in the morning, I hope there won't be any errors.

Success! It took 10 hours to rebuild the drive (speeds near the start
of the disk are significantly faster than those near the end of the
disk, so early estimates quoted by /proc/mdstat above were overly
optimistic):

[3720270.120695] md: bind<sde1>
[3720270.162933] RAID conf printout:
[3720270.162942]  --- level:6 rd:6 wd:5
[3720270.162949]  disk 0, o:1, dev:sdi1
[3720270.162954]  disk 1, o:1, dev:sdf1
[3720270.162958]  disk 2, o:1, dev:sdh1
[3720270.162962]  disk 3, o:1, dev:sde1
[3720270.162965]  disk 4, o:1, dev:sdg1
[3720270.162969]  disk 5, o:1, dev:sdd1
[3720270.163060] md: recovery of RAID array md0
[3720270.163067] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[3720270.163071] md: using maximum available idle IO bandwidth (but
not more than 200000 KB/sec) for recovery.
[3720270.163085] md: using 128k window, over a total of 2930002432k.
[3756293.459324] md: md0: recovery done.
[3756294.797961] RAID conf printout:
[3756294.797969]  --- level:6 rd:6 wd:6
[3756294.797974]  disk 0, o:1, dev:sdi1
[3756294.797979]  disk 1, o:1, dev:sdf1
[3756294.797982]  disk 2, o:1, dev:sdh1
[3756294.797986]  disk 3, o:1, dev:sde1
[3756294.797989]  disk 4, o:1, dev:sdg1
[3756294.797992]  disk 5, o:1, dev:sdd1


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-09-06 15:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-05 16:49 [gentoo-user] A drive in my RAID6 has failed Paul Hartman
2013-09-05 16:52 ` Michael Orlitzky
2013-09-05 17:11   ` Paul Hartman
2013-09-06  5:46     ` Paul Hartman
2013-09-06 15:40       ` Paul Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox