[gentoo-user] which linux RAID setup to choose?

public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-user] which linux RAID setup to choose?
@ 2020-05-03  5:44 Caveman Al Toraboran
  2020-05-03  7:53 ` hitachi303
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Caveman Al Toraboran @ 2020-05-03  5:44 UTC (permalink / raw
  To: Gentoo

hi - i'm to setup my 1st RAID, and i'd appreciate
if any of you volunteers some time to share your
valuable experience on this subject.

my scenario
-----------

    0. i don't boot from the RAID.

    1. read is as important as write.  i don't
       have any application-specific scenario that
       makes me somehow favor one over another.
       so RAIDs that speed up the read (or write)
       while significantly harming the write (or
       read) is not welcome.

    2. replacing failed disks may take a week or
       two.  so, i guess that i may have several
       disks fail one after another in the 1-2
       weeks (specially if they were bought
       about the same time).

    3. i would like to be able to grow the RAID's
       total space (as needed), and increase its
       reliability (i.e. duplicates/partities) as
       needed.

       e.g. suppose that i got a 2TB RAID that
       tolerates 1 disk failure.  i'd like to, at
       some point, to have the following options:

         * only increase the total space (e.g.
           make it 3TB), without increasing
           failure toleration (so 2 disk failure
           would result in data loss).

         * or, only increase the failure tolerance
           (e.g. such that 2 disks failure would
           not lead to data loss), without
           increasing the total space (e.g. space
           remains 2TB).

         * or, increase, both, the space and the
           failure tolerance at the same time.

    4. only interested in software RAID.

my thought
----------

i think these are not suitable:

    * RAID 0: fails to satisfy point (3).

    * RAID 1: fails to satisfy points (1) and (3).

    * RAIDs 4 to 6: fails to satisfy point (3)
      since they are stuck with a fixed tolerance
      towards failing disks (i.e. RAIDs 4 and 5
      tolerate only 1 disk failure, and RAID 6
      tolerates only 2).


this leaves me with RAID 10, with the "far"
layout.  e.g. --layout=n2 would tolerate the
failure of two disks, --layout=n3 three, etc.  or
is it?  (i'm not sure).

my questions
------------

Q1: which RAID setup would you recommend?

Q2: how would the total number of disks in a
    RAID10 setup affect the tolerance towards
    the failing disks?

    if the total number of disks is even, then
    it is easy to see how this is equivalent
    to the classical RAID 1+0 as shown in
    md(4), where any disk failure is tolerated
    for as long as each RAID1 group has 1 disk
    failure only.

    so, we get the following combinations of
    disk failures that, if happen, we won't
    lose any data:

          RAID0
      ------^------
    RAID1       RAID1
    --^--       --^--
    F   .       .   .       < cases with
    .   F       .   .       < single disk
    .   .       F   .       < failures
    .   .       .   F       <

    F   .       .   F       < cases with
    .   F       F   .       < two disk
    .   F       .   F       < failures
    F   .       F   .       <
    .   F       F   .       <

    this gives us 4+5=9 possible disk failure
    scenarious where we can survive it without
    any data loss.

    but, when the number of disks is odd, then
    written bytes and their duplicates will
    start wrap around, and it is difficult for
    me to intuitively see how would this
    affect the total number of scenarious
    where i will survive a disk failure.

Q3: what are the future growth/shrinkage
    options for a RAID10 setup?  e.g. with
    respect to these:

    1. read/write speed.
    2. tolerance guarantee towards failing
       disks.
    3. total available space.

rgrds,
cm.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03  5:44 [gentoo-user] which linux RAID setup to choose? Caveman Al Toraboran
@ 2020-05-03  7:53 ` hitachi303
  2020-05-03  9:23   ` Wols Lists
  2020-05-03  9:14 ` Wols Lists
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: hitachi303 @ 2020-05-03  7:53 UTC (permalink / raw
  To: gentoo-user

Am 03.05.2020 um 07:44 schrieb Caveman Al Toraboran:
>   * RAIDs 4 to 6: fails to satisfy point (3)
>        since they are stuck with a fixed tolerance
>        towards failing disks (i.e. RAIDs 4 and 5
>        tolerate only 1 disk failure, and RAID 6
>        tolerates only 2).

As far as I remember there can be spare drives / partitions which will 
replace a failed one if needed. But this does not help if drives / 
partitions fail at the same moment. Under normal conditions spares will 
rise the number of drives which can fail.

Nothing you asked but I had very bad experience with drives which spin 
down by themselves to save energy (mostly titled green or so).

Also there has been some talk about SMR
https://hardware.slashdot.org/story/20/04/19/0432229/storage-vendors-are-quietly-slipping-smr-disks-into-consumer-hard-drives


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03  5:44 [gentoo-user] which linux RAID setup to choose? Caveman Al Toraboran
  2020-05-03  7:53 ` hitachi303
@ 2020-05-03  9:14 ` Wols Lists
  2020-05-03  9:21   ` Caveman Al Toraboran
  2020-05-03 14:27 ` Jack
  2020-05-03 20:07 ` Rich Freeman
  3 siblings, 1 reply; 25+ messages in thread
From: Wols Lists @ 2020-05-03  9:14 UTC (permalink / raw
  To: gentoo-user

On 03/05/20 06:44, Caveman Al Toraboran wrote:
> hi - i'm to setup my 1st RAID, and i'd appreciate
> if any of you volunteers some time to share your
> valuable experience on this subject.
> 
> my scenario
> -----------
> 
>     0. i don't boot from the RAID.
> 
>     1. read is as important as write.  i don't
>        have any application-specific scenario that
>        makes me somehow favor one over another.
>        so RAIDs that speed up the read (or write)
>        while significantly harming the write (or
>        read) is not welcome.
> 
>     2. replacing failed disks may take a week or
>        two.  so, i guess that i may have several
>        disks fail one after another in the 1-2
>        weeks (specially if they were bought
>        about the same time).
> 
>     3. i would like to be able to grow the RAID's
>        total space (as needed), and increase its
>        reliability (i.e. duplicates/partities) as
>        needed.
> 
>        e.g. suppose that i got a 2TB RAID that
>        tolerates 1 disk failure.  i'd like to, at
>        some point, to have the following options:
> 
>          * only increase the total space (e.g.
>            make it 3TB), without increasing
>            failure toleration (so 2 disk failure
>            would result in data loss).
> 
>          * or, only increase the failure tolerance
>            (e.g. such that 2 disks failure would
>            not lead to data loss), without
>            increasing the total space (e.g. space
>            remains 2TB).
> 
>          * or, increase, both, the space and the
>            failure tolerance at the same time.
> 
>     4. only interested in software RAID.
> 
> my thought
> ----------
> 
> i think these are not suitable:
> 
>     * RAID 0: fails to satisfy point (3).
> 
>     * RAID 1: fails to satisfy points (1) and (3).
> 
>     * RAIDs 4 to 6: fails to satisfy point (3)
>       since they are stuck with a fixed tolerance
>       towards failing disks (i.e. RAIDs 4 and 5
>       tolerate only 1 disk failure, and RAID 6
>       tolerates only 2).
> 
> 
> this leaves me with RAID 10, with the "far"
> layout.  e.g. --layout=n2 would tolerate the
> failure of two disks, --layout=n3 three, etc.  or
> is it?  (i'm not sure).
> 
> my questions
> ------------
> 
> Q1: which RAID setup would you recommend?

I'd recommend having a spare in the array. That way, a single failure
would not affect redundancy at all. You can then replace the spare at
your leisure.

If you want to grow the array, I'd also suggest "raid 5 + spare". That's
probably better than 6 for writing. but 6 is better than 5 for
redundancy. Look at having a journal - that could speed up write speed
for raid 6.
> 
> Q2: how would the total number of disks in a
>     RAID10 setup affect the tolerance towards
>     the failing disks?
> 
Sadly, it doesn't. If you have two copies, losing two disks COULD take
out your raid.

>     if the total number of disks is even, then
>     it is easy to see how this is equivalent
>     to the classical RAID 1+0 as shown in
>     md(4), where any disk failure is tolerated
>     for as long as each RAID1 group has 1 disk
>     failure only.

That's a gamble ...
> 
>     so, we get the following combinations of
>     disk failures that, if happen, we won't
>     lose any data:
> 
>           RAID0
>       ------^------
>     RAID1       RAID1
>     --^--       --^--
>     F   .       .   .       < cases with
>     .   F       .   .       < single disk
>     .   .       F   .       < failures
>     .   .       .   F       <
> 
>     F   .       .   F       < cases with
>     .   F       F   .       < two disk
>     .   F       .   F       < failures
>     F   .       F   .       <
>     .   F       F   .       <
> 
>     this gives us 4+5=9 possible disk failure
>     scenarious where we can survive it without
>     any data loss.
> 
>     but, when the number of disks is odd, then
>     written bytes and their duplicates will
>     start wrap around, and it is difficult for
>     me to intuitively see how would this
>     affect the total number of scenarious
>     where i will survive a disk failure.
> 
> Q3: what are the future growth/shrinkage
>     options for a RAID10 setup?  e.g. with
>     respect to these:
> 
>     1. read/write speed.

iirc far is good for speed.

>     2. tolerance guarantee towards failing
>        disks.

Guarantees? If you have two mirrors. the guarantee is just ONE disk. Yes
you can gamble on losing more.

>     3. total available space.

iirc you can NOT grow the far layout.
> 
> rgrds,
> cm.
> 
You have looked at the wiki - yes I know I push it regularly :-)

https://raid.wiki.kernel.org/index.php/Linux_Raid

Cheers,
Wol



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03  9:14 ` Wols Lists
@ 2020-05-03  9:21   ` Caveman Al Toraboran
  0 siblings, 0 replies; 25+ messages in thread
From: Caveman Al Toraboran @ 2020-05-03  9:21 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

On Sunday, May 3, 2020 1:14 PM, Wols Lists <antlists@youngman.org.uk> wrote:

> > Q3: what are the future growth/shrinkage
> > options for a RAID10 setup? e.g. with
> > respect to these:
> >
> >     1. read/write speed.
> >
>
> iirc far is good for speed.
>
> >     2. tolerance guarantee towards failing
> >        disks.
> >
>
> Guarantees? If you have two mirrors. the guarantee is just ONE disk. Yes
> you can gamble on losing more.
>
> >     3. total available space.
> >
>
> iirc you can NOT grow the far layout.

sorry, typo, i meant "near" (the command was right
though --layout=n2)



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03  7:53 ` hitachi303
@ 2020-05-03  9:23   ` Wols Lists
  2020-05-03 17:55     ` Caveman Al Toraboran
  0 siblings, 1 reply; 25+ messages in thread
From: Wols Lists @ 2020-05-03  9:23 UTC (permalink / raw
  To: gentoo-user

On 03/05/20 08:53, hitachi303 wrote:
> Nothing you asked but I had very bad experience with drives which spin
> down by themselves to save energy (mostly titled green or so).

Good catch!

For anything above raid 1, MAKE SURE your drives support SCT/ERC. For
example, Seagate Barracudas are very popular desktop drives, but I guess
maybe HALF of the emails asking for help recovering an array on the raid
list involve them dying ...

(I've got two :-( but my new system - when I get it running - has
ironwolves instead.)

Cheers,
Wol


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03  5:44 [gentoo-user] which linux RAID setup to choose? Caveman Al Toraboran
  2020-05-03  7:53 ` hitachi303
  2020-05-03  9:14 ` Wols Lists
@ 2020-05-03 14:27 ` Jack
  2020-05-03 21:46   ` Caveman Al Toraboran
  2020-05-03 20:07 ` Rich Freeman
  3 siblings, 1 reply; 25+ messages in thread
From: Jack @ 2020-05-03 14:27 UTC (permalink / raw
  To: gentoo-user

On 5/3/20 1:44 AM, Caveman Al Toraboran wrote:
> [snip]...
>      so, we get the following combinations of
>      disk failures that, if happen, we won't
>      lose any data:
>
>            RAID0
>        ------^------
>      RAID1       RAID1
>      --^--       --^--
>      F   .       .   .       < cases with
>      .   F       .   .       < single disk
>      .   .       F   .       < failures
>      .   .       .   F       <
>
>      F   .       .   F       < cases with
>      .   F       F   .       < two disk
>      .   F       .   F       < failures
>      F   .       F   .       <
>      .   F       F   .       <
>
>      this gives us 4+5=9 possible disk failure
>      scenarious where we can survive it without
>      any data loss.

Minor point - you have one duplicate line there ". f  f ." which is the 
second and last line of the second group.  No effect on anything else in 
the discussion.

Trying to help thinking about odd numbers of disks, if you are still 
allowing only one disk to fail, then you can think about mirroring half 
disks, so each disk has half of it mirrored to a different disk, instead 
of drives always being mirrored in pairs.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03  9:23   ` Wols Lists
@ 2020-05-03 17:55     ` Caveman Al Toraboran
  2020-05-03 18:04       ` Dale
                         ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Caveman Al Toraboran @ 2020-05-03 17:55 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

On Sunday, May 3, 2020 1:23 PM, Wols Lists <antlists@youngman.org.uk> wrote:

> For anything above raid 1, MAKE SURE your drives support SCT/ERC. For
> example, Seagate Barracudas are very popular desktop drives, but I guess
> maybe HALF of the emails asking for help recovering an array on the raid
> list involve them dying ...
>
> (I've got two :-( but my new system - when I get it running - has
> ironwolves instead.)

that's very scary.

just to double check:  are those help emails about
linux's software RAID?  or is it about hardware
RAIDs?

the reason i ask about software vs. hardware, is
because of this wiki article [1] which seems to
suggest that mdadm handles error recovery by
waiting for up to 30 seconds (set in
/sys/block/sd*/device/timeout) after which the
device is reset.

am i missing something?  to me it seems that [1]
seems to suggest that linux software raid has a
reliable way to handle the issue?  since i guess
all disks support resetting well?

[1] https://en.wikipedia.org/wiki/Error_recovery_control#Software_RAID

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 17:55     ` Caveman Al Toraboran
@ 2020-05-03 18:04       ` Dale
  2020-05-03 18:29       ` Mark Knecht
  2020-05-03 21:22       ` antlists
  2 siblings, 0 replies; 25+ messages in thread
From: Dale @ 2020-05-03 18:04 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1722 bytes --]

Caveman Al Toraboran wrote:
> On Sunday, May 3, 2020 1:23 PM, Wols Lists <antlists@youngman.org.uk> wrote:
>
>> For anything above raid 1, MAKE SURE your drives support SCT/ERC. For
>> example, Seagate Barracudas are very popular desktop drives, but I guess
>> maybe HALF of the emails asking for help recovering an array on the raid
>> list involve them dying ...
>>
>> (I've got two :-( but my new system - when I get it running - has
>> ironwolves instead.)
> that's very scary.
>
> just to double check:  are those help emails about
> linux's software RAID?  or is it about hardware
> RAIDs?
>
> the reason i ask about software vs. hardware, is
> because of this wiki article [1] which seems to
> suggest that mdadm handles error recovery by
> waiting for up to 30 seconds (set in
> /sys/block/sd*/device/timeout) after which the
> device is reset.
>
> am i missing something?  to me it seems that [1]
> seems to suggest that linux software raid has a
> reliable way to handle the issue?  since i guess
> all disks support resetting well?
>
> [1] https://en.wikipedia.org/wiki/Error_recovery_control#Software_RAID
>
>
>


I'd like to add something about the PMR/SMR thing.  I bought a SMR drive
without knowing it.  Now when I search for a hard drive, I add NAS to
the search string.  That seems to weed out the SMR type drives.  Once I
find a exact model, I google it up to confirm.  So far, that little
trick has worked pretty well.  It may be something you want to consider
using as well.  NAS drives tend to be more robust it seems.  Given you
are using RAID, you likely want a more robust and dependable drive, if
drives can be put into that category nowadays.  :/

Hope that helps. 

Dale

:-)  :-) 

[-- Attachment #2: Type: text/html, Size: 2482 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 17:55     ` Caveman Al Toraboran
  2020-05-03 18:04       ` Dale
@ 2020-05-03 18:29       ` Mark Knecht
  2020-05-03 20:16         ` Rich Freeman
  2020-05-03 21:22       ` antlists
  2 siblings, 1 reply; 25+ messages in thread
From: Mark Knecht @ 2020-05-03 18:29 UTC (permalink / raw
  To: Gentoo User

[-- Attachment #1: Type: text/plain, Size: 2063 bytes --]

On Sun, May 3, 2020 at 10:56 AM Caveman Al Toraboran <
toraboracaveman@protonmail.com> wrote:
>
> On Sunday, May 3, 2020 1:23 PM, Wols Lists <antlists@youngman.org.uk>
wrote:
>
> > For anything above raid 1, MAKE SURE your drives support SCT/ERC. For
> > example, Seagate Barracudas are very popular desktop drives, but I guess
> > maybe HALF of the emails asking for help recovering an array on the raid
> > list involve them dying ...
> >
> > (I've got two :-( but my new system - when I get it running - has
> > ironwolves instead.)
>
> that's very scary.
>
> just to double check:  are those help emails about
> linux's software RAID?  or is it about hardware
> RAIDs?
>
> the reason i ask about software vs. hardware, is
> because of this wiki article [1] which seems to
> suggest that mdadm handles error recovery by
> waiting for up to 30 seconds (set in
> /sys/block/sd*/device/timeout) after which the
> device is reset.
>
> am i missing something?  to me it seems that [1]
> seems to suggest that linux software raid has a
> reliable way to handle the issue?  since i guess
> all disks support resetting well?
>
> [1] https://en.wikipedia.org/wiki/Error_recovery_control#Software_RAID
>

When doing Linux RAID, hardware or software, make sure you get a RAID aware
drive that supports TLER (Time Limited Error Recovery) or whatever the
vendor that makes your drive calls it. Typically this is set at about 7
seconds guaranteeing that no mater what's going on the drive will respond
to the upper layers (mdadm) to let it know it's alive. A non-RAID drive
with no TLER feature will respond when it's ready and typically if that's
longer than 30 seconds then the RAID subsystem kicks the drive and you have
to re-add it. While there's nothing 'technically' wrong with the storage
when the RAID rebuilds you eventually hit another on of these >30 second
waits and another drive gets kicked and you're dead.

I've used the WD Reds and WD Golds (no not sold) and never had any problem.

Build a RAID with a WD Green and you're in for trouble. ;-)))

HTH,
Mark

[-- Attachment #2: Type: text/html, Size: 2691 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03  5:44 [gentoo-user] which linux RAID setup to choose? Caveman Al Toraboran
                   ` (2 preceding siblings ...)
  2020-05-03 14:27 ` Jack
@ 2020-05-03 20:07 ` Rich Freeman
  2020-05-03 21:32   ` antlists
  3 siblings, 1 reply; 25+ messages in thread
From: Rich Freeman @ 2020-05-03 20:07 UTC (permalink / raw
  To: gentoo-user

On Sun, May 3, 2020 at 1:44 AM Caveman Al Toraboran
<toraboracaveman@protonmail.com> wrote:
>
>     * RAID 1: fails to satisfy points (1) and (3)...
> this leaves me with RAID 10

Two things:

1.  RAID 10 doesn't satisfy point 1 (read and write performance are
identical).  No RAID implementation I'm aware of does.

2.  Some RAID1 implementations can satisfy point 3 (expandability to
additional space and replication multiplicities), particular when
combined with LVM.

I'd stop and think about your requirements a bit.  You seem really
concerned about having identical read and write performance.  RAID
implementations all have their pros in cons both in comparison with
each other, in comparison with non-RAID, and in comparison between
read and write within any particular RAID implementation.

I don't think you should focus so much on whether read=write in your
RAID.  I'd focus more on whether read and write both meet your
requirements.

And on that note, what are your requirements?  You haven't mentioned
what you plan to store on it or how this data will be stored or
accessed.  It is hard to say whether any design will meet your
performance requirements when you haven't provided any, other than a
fairly arbitrary read=write one.

In general most RAID1 implementations aren't going to lag regular
non-RAID disk by much and will often exceed it (especially for
reading).  I'm not saying RAID1 is the best option for you - I'm just
suggesting that you don't toss it out just because it reads faster
than it writes, especially in favor of RAID 10 which also reads faster
than it writes but has the additional caveat that small writes may
necessitate an additional read before write.

Not knowing your requirements it is hard to make more specific
recommendations but I'd also consider ZFS and distributed filesystems.
They have some pros and cons around flexibility and if you're
operating at a small scale - it might not be appropriate for your use
case, but you should consider them.

-- 
Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 18:29       ` Mark Knecht
@ 2020-05-03 20:16         ` Rich Freeman
  2020-05-03 22:52           ` Mark Knecht
  0 siblings, 1 reply; 25+ messages in thread
From: Rich Freeman @ 2020-05-03 20:16 UTC (permalink / raw
  To: gentoo-user

On Sun, May 3, 2020 at 2:29 PM Mark Knecht <markknecht@gmail.com> wrote:
>
> I've used the WD Reds and WD Golds (no not sold) and never had any problem.
>

Up until a few weeks ago I would have advised the same, but WD was
just caught shipping unadvertised SMR in WD Red disks.  This is going
to at the very least impact your performance if you do a lot of
writes, and it can be incompatible with rebuilds in particular with
some RAID implementations.  Seagate and Toshiba have also been quietly
using it but not in their NAS-labeled drives and not as extensively in
general.

At the very least you should check the model number lists that have
been recently released to check if the drive you want to get uses SMR.
I'd also get it from someplace with a generous return policy and do
some benchmarking to confirm that the drive isn't SMR (you're probably
going to have to do continuous random writes exceeding the total
capacity of the drive before you see problems - or at least quite a
bit of random writing - the amount of writing needed will be less once
the drive has been in use for a while but a fresh drive basically acts
like close to a full-disk-sized write cache as far as SMR goes).

> Build a RAID with a WD Green and you're in for trouble. ;-)))

It really depends on your RAID implementation.  Certainly I agree that
it is better to have TLER, but for some RAID implementations not
having it just causes performance drops when you actually have errors
(which should be very rare).  For others it can cause drives to be
dropped.  I wouldn't hesitate to use greens in an mdadm or zfs array
with default options, but with something like hardware RAID I'd be
more careful.  If you use aggressive timeouts on your RAID then the
Green is more likely to get kicked out.

I agree with the general sentiment to have a spare if it will take you
a long time to replace failed drives.  Alternatively you can have
additional redundancy, or use a RAID alternative that basically treats
all free space as an effective spare (like many distributed
filesystems).

-- 
Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 17:55     ` Caveman Al Toraboran
  2020-05-03 18:04       ` Dale
  2020-05-03 18:29       ` Mark Knecht
@ 2020-05-03 21:22       ` antlists
  2 siblings, 0 replies; 25+ messages in thread
From: antlists @ 2020-05-03 21:22 UTC (permalink / raw
  To: gentoo-user

On 03/05/2020 18:55, Caveman Al Toraboran wrote:
> On Sunday, May 3, 2020 1:23 PM, Wols Lists <antlists@youngman.org.uk> wrote:
> 
>> For anything above raid 1, MAKE SURE your drives support SCT/ERC. For
>> example, Seagate Barracudas are very popular desktop drives, but I guess
>> maybe HALF of the emails asking for help recovering an array on the raid
>> list involve them dying ...
>>
>> (I've got two :-( but my new system - when I get it running - has
>> ironwolves instead.)
> 
> that's very scary.
> 
> just to double check:  are those help emails about
> linux's software RAID?  or is it about hardware
> RAIDs?

They are about linux software raid. Hardware raid won't be any better.
> 
> the reason i ask about software vs. hardware, is
> because of this wiki article [1] which seems to
> suggest that mdadm handles error recovery by
> waiting for up to 30 seconds (set in
> /sys/block/sd*/device/timeout) after which the
> device is reset.

Which if your drive does not support SCT/ERC then goes *badly* wrong.
> 
> am i missing something? 

Yes ...

> to me it seems that [1]
> seems to suggest that linux software raid has a
> reliable way to handle the issue? 

Well, if the paragraph below were true, it would.

> since i guess all disks support resetting well?

That's the point. THEY DON'T! That's why you need SCT/ERC ...
> 
> [1] https://en.wikipedia.org/wiki/Error_recovery_control#Software_RAID
> 
https://raid.wiki.kernel.org/index.php/Choosing_your_hardware,_and_what_is_a_device%3F#Desktop_and_Enterprise_drives

https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

Cheers,
Wol


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 20:07 ` Rich Freeman
@ 2020-05-03 21:32   ` antlists
  2020-05-03 22:34     ` Rich Freeman
  0 siblings, 1 reply; 25+ messages in thread
From: antlists @ 2020-05-03 21:32 UTC (permalink / raw
  To: gentoo-user

On 03/05/2020 21:07, Rich Freeman wrote:
> I don't think you should focus so much on whether read=write in your
> RAID.  I'd focus more on whether read and write both meet your
> requirements.

If you think about it, it's obvious that raid-1 will read faster than it 
writes - it has to write two copies while it only reads one.

Likewise, raids 5 and 6 will be slower writing than reading - for a 
normal read it only reads the data disks, but when writing it has to 
write (and calculate!) parity as well.

A raid 1 should read data faster than a lone disk. A raid 5 or 6 should 
read noticeably faster because it's reading across more than one disk.

If you're worried about write speeds, add a cache.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 14:27 ` Jack
@ 2020-05-03 21:46   ` Caveman Al Toraboran
  2020-05-03 22:50     ` hitachi303
  2020-05-03 23:19     ` antlists
  0 siblings, 2 replies; 25+ messages in thread
From: Caveman Al Toraboran @ 2020-05-03 21:46 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

On Sunday, May 3, 2020 6:27 PM, Jack <ostroffjh@users.sourceforge.net> wrote:

> Minor point - you have one duplicate line there ". f  f ." which is the
> second and last line of the second group.  No effect on anything else in
> the discussion.

thanks.

> Trying to help thinking about odd numbers of disks, if you are still
> allowing only one disk to fail, then you can think about mirroring half
> disks, so each disk has half of it mirrored to a different disk, instead
> of drives always being mirrored in pairs.

that definitely helped get me unstuck and continue
thinking.  thanks.

curious.  how do people look at --layout=n2 in the
storage industry?  e.g. do they ignore the
optimistic case where 2 disk failures can be
recovered, and only assume that it protects for 1
disk failure?

i see why gambling is not worth it here, but at
the same time, i see no reason to ignore reality
(that a 2 disk failure can be saved).

e.g. a 4-disk RAID10 with -layout=n2 gives

        1*4/10 + 2*4/10 = 1.2

expected recoverable disk failures.  details are
below:

  F   .       .   .       < recoverable
  .   F       .   .       < cases with
  .   .       F   .       < 1 disk
  .   .       .   F       < failure

  F   .       .   F       < recoverable
  .   F       F   .       < cases with
  .   F       .   F       < 2 disk
  F   .       F   .       < failures

  F   F       .   .       < not recoverable
  .   .       F   F       < cases with 2 disk
                          < failures

now, if we do a 5-disk --layout=n2, we get:

    1    (1)    2    (2)    3
   (3)    4    (4)    5    (5)
    6    (6)    7    (7)    8
   (8)    9    (9)    10   (10)
    11   (11)   12   (12)   13
   (13) ...

obviously, there are 5 possible ways a single disk
may fail, out of which all of the 5 will be
recovered.

there are nchoosek(5,2) = 10 possible ways a 2
disk failure could happen, out of which 5
will be recovered:

   xxx   (1)   xxx   (2)    3
   xxx    4    xxx    5    (5)

   xxx   (1)    2    xxx    3
   xxx    4    (4)   xxx   (5)


    1    xxx    2    xxx    3
   (3)   xxx   (4)   xxx   (5)

    1    xxx    2    (2)   xxx
   (3)   xxx   (4)    5    xxx


    1    (1)   xxx   (2)   xxx
   (3)    4    xxx    5    xxx

so, expected recoverable disk failures for a
5-disk RAID10 --layout=n2 is:

        1*5/15 + 2*5/15 = 1

so, by transforming a 4-disk RAID10 into a 5-disk
one, we increase total storage capacity by a 0.5
disk's worth of storage, while losing the ability
to recover 0.2 disks.

but if we extended the 4-disk RAID10 into a
6-disk --layout=n2, we will have:

             6                  nchoosek(6,2) - 3
= 1 * -----------------  +  2 * -----------------
      6 + nchoosek(6,2)         6 + nchoosek(6,2)

= 6/21                   +  2 * 12/15

= 1.8857 expected recoverable failing disks.

almost 2.  i.e. there is 80% chance of surviving a
2 disk failure.

so, i wonder, is it a bad decision to go with an
even number disks with a RAID10?  what is the
right way to think to find an answer to this
question?

i guess the ultimate answer needs knowledge of
these:

    * F1: probability of having 1 disks fail within
          the repair window.
    * F2: probability of having 2 disks fail within
          the repair window.
    * F3: probability of having 3 disks fail within
      .   the repair window.
      .
      .
    * Fn: probability of having n disks fail within
          the repair window.

    * R1: probability of surviving 1 disks failure.
          equals 1 with all related cases.
    * R2: probability of surviving 2 disks failure.
          equals 1/3 with 5-disk RAID10
          equals 0.8 with a 6-disk RAID10.
    * R3: probability of surviving 3 disks failure.
          equals 0 with all related cases.
      .
      .
      .
    * Rn: probability of surviving n disks failure.
          equals 0 with all related cases.

    * L : expected cost of losing data on an array.
    * D : price of a disk.

this way, the absolute expected cost when adopting
a 6-disk RAID10 is:

= 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ...
= 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ...
= 6D + 0          + F2*(0.2)*L   + F3*(1-0)*L + ...

and the absolute cost for a 5-disk RAID10 is:

= 5D + F1*(1-1)*L + F2*(1-0.3333)*L + F3*(1-0)*L + ...
= 5D + 0          + F2*(0.6667)*L   + F3*(1-0)*L + ...

canceling identical terms, the difference cost is:

6-disk ===> 6D + 0.2*F2*L
5-disk ===> 5D + 0.6667*F2*L

from here [1] we know that a 1TB disk costs
$35.85, so:

6-disk ===> 6*35.85 + 0.2*F2*L
5-disk ===> 5*35.85 + 0.6667*F2*L

now, at which point is a 5-disk array a better
economical decision than a 6-disk one?  for
simplicity, let LOL = F2*L:

5*35.85 + 0.6667 * LOL  <   6*35.85 + 0.2 * LOL
0.6667*LOL - 0.2 * LOL  <   6*35.85 - 5*35.85
LOL * (0.6667 - 0.2)    <   6*35.85 - 5*35.85

                            6*35.85 - 5*35.85
           LOL          <   -----------------
                              0.6667 - 0.2

           LOL          <   76.816
           F2*L         <   76.816

so, a 5-disk RAID10 is better than a 6-disk RAID10
only if:

        F2*L  <  76.816 bucks.

this site [2] says that 76% of seagate disks fail
per year (:D).  and since disks fail independent
of each other mostly, then, the probabilty of
having 2 disks fail in a year is:

        F2_year = 0.76*0.76
                = 0.5776

but what is F2_week?  each year has 52.1429 weeks.
let's be generous and assume that disks fail at a
uniform distribution across the year (e.g. suppose
that we bought them randomlyly, and not in a
single batch).

in this case, the probability of 2 disks failing
in the same week (suppose that our repair window
is 1 week):

                          52
    F2 = 0.5776 * --------------------
                 52 + nchoosek(52, 2)

       = 0.5776 * 0.037736
       = 0.021796

let's substitute a bit:

        F2 * L  <  76.816  bucks.
  0.021796 * L  <  76.816  bucks.
             L  <  76.816 / 0.021796  bucks.
             L  <  3524.3  bucks.

so, in summary:

 /------------------------------------------------\
 | a 5-disk RAID10 is better than a 6-disk RAID10 |
 | ONLY IF your data is WORTH LESS than 3,524.3   |
 | bucks.                                         |
 \------------------------------------------------/

any thoughts?  i'm a newbie.  i wonder how
industry people think?

happy quarantine,
cm

------------
[1] https://www.amazon.com/WD-Blue-1TB-Hard-Drive/dp/B0088PUEPK/
[2] https://www.seagate.com/em/en/support/kb/hard-disk-drive-reliability-and-mtbf-afr-174791en/



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 21:32   ` antlists
@ 2020-05-03 22:34     ` Rich Freeman
  0 siblings, 0 replies; 25+ messages in thread
From: Rich Freeman @ 2020-05-03 22:34 UTC (permalink / raw
  To: gentoo-user

On Sun, May 3, 2020 at 5:32 PM antlists <antlists@youngman.org.uk> wrote:
>
> On 03/05/2020 21:07, Rich Freeman wrote:
> > I don't think you should focus so much on whether read=write in your
> > RAID.  I'd focus more on whether read and write both meet your
> > requirements.
>
> If you think about it, it's obvious that raid-1 will read faster than it
> writes - it has to write two copies while it only reads one.

Yes.  The same is true for RAID10, since it has to also write two
copies of everything.

>
> Likewise, raids 5 and 6 will be slower writing than reading - for a
> normal read it only reads the data disks, but when writing it has to
> write (and calculate!) parity as well.

Yes, but with any of the striped modes (0, 5, 6, 10) there is an
additional issue.  Writes have to generally be made in entire stripes,
so if you overwrite data in-place in units smaller than an entire
stripe, then the entire stripe needs to first be read, and then it can
be overwritten again.  This is an absolute requirement if there is
parity involved.  If there is no parity (RAID 0,10) then an
implementation might be able to overwrite part of a stripe in place
without harming the rest.

>
> A raid 1 should read data faster than a lone disk. A raid 5 or 6 should
> read noticeably faster because it's reading across more than one disk.

More-or-less.  RAID 1 is going to generally benefit from lower latency
because reads can be divided across mirrored copies (and there could
be more than one replica).  Any of the striped modes are going to be
the same as a single disk on latency, but will have much greater
bandwidth.  That bandwidth gain applies to both reading and writing,
as long as the data is sequential.

This is why it is important to understand your application.  There is
no one "best" RAID implementation.  They all have pros and cons
depending on whether you care more about latency vs bandwidth and also
read vs write.

And of course RAID isn't the only solution out there for this stuff.
Distributed filesystems also have pros and cons, and often those have
multiple modes of operation on top of this (usually somewhat mirroring
the options available for RAID but across multiple hosts).

For general storage I'm using zfs with raid1 pairs of disks (the pool
can have multiple pairs), and for my NAS for larger-scale media/etc
storage I'm using lizardfs.  I'd use ceph instead in any kind of
enterprise setup, but that is much more RAM-hungry and I'm cheap.

-- 
Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 21:46   ` Caveman Al Toraboran
@ 2020-05-03 22:50     ` hitachi303
  2020-05-04  0:29       ` Caveman Al Toraboran
  2020-05-04  0:46       ` Rich Freeman
  2020-05-03 23:19     ` antlists
  1 sibling, 2 replies; 25+ messages in thread
From: hitachi303 @ 2020-05-03 22:50 UTC (permalink / raw
  To: gentoo-user

Am 03.05.2020 um 23:46 schrieb Caveman Al Toraboran:
> so, in summary:
> 
>   /------------------------------------------------\
>   | a 5-disk RAID10 is better than a 6-disk RAID10 |
>   | ONLY IF your data is WORTH LESS than 3,524.3   |
>   | bucks.                                         |
>   \------------------------------------------------/
> 
> any thoughts?  i'm a newbie.  i wonder how
> industry people think?

Don't forget that having more drives increases the odds of a failing 
drive. If you have infinite drives at any given moment infinite drives 
will fail. Anyway I wouldn't know how to calculate this.

Most people are limited by money and space. Even if this isn't your 
problem you will always need an additional backup strategy. The hole 
system can fail.
I run a system with 8 drives where two can fail and they can be hot 
swoped. This is a closed source SAS which I really like except the part 
being closed source. I don't even know what kind of raid is used.

The only person I know who is running a really huge raid ( I guess 2000+ 
drives) is comfortable with some spare drives. His raid did fail an can 
fail. Data will be lost. Everything important has to be stored at a 
secondary location. But they are using the raid to store data for some 
days or weeks when a server is calculating stuff. If the raid fails they 
have to restart the program for the calculation.

Facebook used to store data which is sometimes accessed on raids. Since 
they use energy they stored data which is nearly never accessed on blue 
ray disks. I don't know if they still do. Reading is very slow if a 
mechanical arm first needs to fetch a specific blue ray out of hundreds 
and put in a disk reader but it is very energy efficient.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 20:16         ` Rich Freeman
@ 2020-05-03 22:52           ` Mark Knecht
  2020-05-03 23:23             ` Rich Freeman
  0 siblings, 1 reply; 25+ messages in thread
From: Mark Knecht @ 2020-05-03 22:52 UTC (permalink / raw
  To: Gentoo User

[-- Attachment #1: Type: text/plain, Size: 1086 bytes --]

On Sun, May 3, 2020 at 1:16 PM Rich Freeman <rich0@gentoo.org> wrote:
>
> On Sun, May 3, 2020 at 2:29 PM Mark Knecht <markknecht@gmail.com> wrote:
> >
> > I've used the WD Reds and WD Golds (no not sold) and never had any
problem.
> >
>
> Up until a few weeks ago I would have advised the same, but WD was
> just caught shipping unadvertised SMR in WD Red disks.  This is going
> to at the very least impact your performance if you do a lot of
> writes, and it can be incompatible with rebuilds in particular with
> some RAID implementations.  Seagate and Toshiba have also been quietly
> using it but not in their NAS-labeled drives and not as extensively in
> general.

I read somewhere that they knew they'd been caught and were coming clean.
As I'm not buying anything at this time I didn't pay too much attention.

This link is at least similar to what I read earlier. Possibly it's of
interest.

https://www.extremetech.com/computing/309730-western-digital-comes-clean-shares-which-hard-drives-use-smr

Another case of unbridled capitalism and consumers being hurt.

Cheers,
Mark

[-- Attachment #2: Type: text/html, Size: 1608 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 21:46   ` Caveman Al Toraboran
  2020-05-03 22:50     ` hitachi303
@ 2020-05-03 23:19     ` antlists
  2020-05-04  1:33       ` Caveman Al Toraboran
  1 sibling, 1 reply; 25+ messages in thread
From: antlists @ 2020-05-03 23:19 UTC (permalink / raw
  To: gentoo-user

On 03/05/2020 22:46, Caveman Al Toraboran wrote:
> On Sunday, May 3, 2020 6:27 PM, Jack <ostroffjh@users.sourceforge.net> wrote:
> 
> 
> curious.  how do people look at --layout=n2 in the
> storage industry?  e.g. do they ignore the
> optimistic case where 2 disk failures can be
> recovered, and only assume that it protects for 1
> disk failure?

You CANNOT afford to be optimistic ... Murphy's law says you will lose 
the wrong second disk.
> 
> i see why gambling is not worth it here, but at
> the same time, i see no reason to ignore reality
> (that a 2 disk failure can be saved).
> 
Don't ignore that some 2-disk failures CAN'T be saved ...

> e.g. a 4-disk RAID10 with -layout=n2 gives
> 
>          1*4/10 + 2*4/10 = 1.2
> 
> expected recoverable disk failures.  details are
> below:
> 
> 
> now, if we do a 5-disk --layout=n2, we get:
> 
>      1    (1)    2    (2)    3
>     (3)    4    (4)    5    (5)
>      6    (6)    7    (7)    8
>     (8)    9    (9)    10   (10)
>      11   (11)   12   (12)   13
>     (13) ...
> 
> obviously, there are 5 possible ways a single disk
> may fail, out of which all of the 5 will be
> recovered.

Don't forget a 4+spare layout, which *should* survive a 2-disk failure.
> 
> there are nchoosek(5,2) = 10 possible ways a 2
> disk failure could happen, out of which 5
> will be recovered:
> 
> 
> so, by transforming a 4-disk RAID10 into a 5-disk
> one, we increase total storage capacity by a 0.5
> disk's worth of storage, while losing the ability
> to recover 0.2 disks.
> 
> but if we extended the 4-disk RAID10 into a
> 6-disk --layout=n2, we will have:
> 
>               6                  nchoosek(6,2) - 3
> = 1 * -----------------  +  2 * -----------------
>        6 + nchoosek(6,2)         6 + nchoosek(6,2)
> 
> = 6/21                   +  2 * 12/15
> 
> = 1.8857 expected recoverable failing disks.
> 
> almost 2.  i.e. there is 80% chance of surviving a
> 2 disk failure.
> 
> so, i wonder, is it a bad decision to go with an
> even number disks with a RAID10?  what is the
> right way to think to find an answer to this
> question?
> 
> i guess the ultimate answer needs knowledge of
> these:
> 
>      * F1: probability of having 1 disks fail within
>            the repair window.
>      * F2: probability of having 2 disks fail within
>            the repair window.
>      * F3: probability of having 3 disks fail within
>        .   the repair window.
>        .
>        .
>      * Fn: probability of having n disks fail within
>            the repair window.
> 
>      * R1: probability of surviving 1 disks failure.
>            equals 1 with all related cases.
>      * R2: probability of surviving 2 disks failure.
>            equals 1/3 with 5-disk RAID10
>            equals 0.8 with a 6-disk RAID10.
>      * R3: probability of surviving 3 disks failure.
>            equals 0 with all related cases.
>        .
>        .
>        .
>      * Rn: probability of surviving n disks failure.
>            equals 0 with all related cases.
> 
>      * L : expected cost of losing data on an array.
>      * D : price of a disk.

Don't forget, if you have a spare disk, the repair window is the length 
of time it takes to fail-over ...
> 
> this way, the absolute expected cost when adopting
> a 6-disk RAID10 is:
> 
> = 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ...
> = 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ...
> = 6D + 0          + F2*(0.2)*L   + F3*(1-0)*L + ...
> 
> and the absolute cost for a 5-disk RAID10 is:
> 
> = 5D + F1*(1-1)*L + F2*(1-0.3333)*L + F3*(1-0)*L + ...
> = 5D + 0          + F2*(0.6667)*L   + F3*(1-0)*L + ...
> 
> canceling identical terms, the difference cost is:
> 
> 6-disk ===> 6D + 0.2*F2*L
> 5-disk ===> 5D + 0.6667*F2*L
> 
> from here [1] we know that a 1TB disk costs
> $35.85, so:
> 
> 6-disk ===> 6*35.85 + 0.2*F2*L
> 5-disk ===> 5*35.85 + 0.6667*F2*L
> 
> now, at which point is a 5-disk array a better
> economical decision than a 6-disk one?  for
> simplicity, let LOL = F2*L:
> 
> 5*35.85 + 0.6667 * LOL  <   6*35.85 + 0.2 * LOL
> 0.6667*LOL - 0.2 * LOL  <   6*35.85 - 5*35.85
> LOL * (0.6667 - 0.2)    <   6*35.85 - 5*35.85
> 
>                              6*35.85 - 5*35.85
>             LOL          <   -----------------
>                                0.6667 - 0.2
> 
>             LOL          <   76.816
>             F2*L         <   76.816
> 
> so, a 5-disk RAID10 is better than a 6-disk RAID10
> only if:
> 
>          F2*L  <  76.816 bucks.
> 
> this site [2] says that 76% of seagate disks fail
> per year (:D).  and since disks fail independent
> of each other mostly, then, the probabilty of
> having 2 disks fail in a year is:
> 
76% seems incredibly high. And no, disks do not fail independently of 
each other. If you buy a bunch of identical disks, at the same time, and 
stick them all in the same raid array, the chances of them all wearing 
out at the same time are rather higher than random chance would suggest.

Which is why, if a raid disk fails, the advice is always to replace it 
asap. And if possible, to recover the failed drive to try and copy that 
rather than hammer the rest of the raid.

Bear in mind that, it doesn't matter how many drives a raid-10 has, if 
you're recovering on to a new drive, the data is stored on just two of 
the other drives. So the chances of them failing as they get hammered 
are a lot higher.

That's why it makes a lot of sense to make sure you monitor the SMARTs, 
so you can replace any of the drives that look like failing before they 
actually do. And check the warranties. Expensive raid drives probably 
have longer warranties, so when they're out of warranty consider 
retiring them (they'll probably last a lot longer, but it's a judgement 
call).

All that said, I've been running a raid-1 mirror for a good few years, 
and I've not had any trouble on my Barracudas.

Cheers,
Wol


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 22:52           ` Mark Knecht
@ 2020-05-03 23:23             ` Rich Freeman
  0 siblings, 0 replies; 25+ messages in thread
From: Rich Freeman @ 2020-05-03 23:23 UTC (permalink / raw
  To: gentoo-user

On Sun, May 3, 2020 at 6:52 PM Mark Knecht <markknecht@gmail.com> wrote:
>
> On Sun, May 3, 2020 at 1:16 PM Rich Freeman <rich0@gentoo.org> wrote:
> >
> > Up until a few weeks ago I would have advised the same, but WD was
> > just caught shipping unadvertised SMR in WD Red disks.  This is going
> > to at the very least impact your performance if you do a lot of
> > writes, and it can be incompatible with rebuilds in particular with
> > some RAID implementations.  Seagate and Toshiba have also been quietly
> > using it but not in their NAS-labeled drives and not as extensively in
> > general.
>
> I read somewhere that they knew they'd been caught and were coming clean.

Yup. WD was caught.  Then they first came out with a "you're using it
wrong" sort of defense but they did list the SMR drives.  Then they
came out with a bit more of an even-handed response.  The others
weren't caught as far as I'm aware but probably figured the writing
was on the wall since no doubt everybody and their uncle is going to
be benchmarking every drive they own.

> Another case of unbridled capitalism and consumers being hurt.

I agree.  This video has a slightly different perspective.  It doesn't
disagree on that conclusion, but it does explain more of the industry
thinking that got us here (beyond the simple/obvious it saves money):
https://www.youtube.com/watch?v=gSionmmunMs

-- 
Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 22:50     ` hitachi303
@ 2020-05-04  0:29       ` Caveman Al Toraboran
  2020-05-04  7:50         ` hitachi303
  2020-05-04  0:46       ` Rich Freeman
  1 sibling, 1 reply; 25+ messages in thread
From: Caveman Al Toraboran @ 2020-05-04  0:29 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

On Monday, May 4, 2020 2:50 AM, hitachi303 <gentoo-user@konstantinhansen.de> wrote:

> Am 03.05.2020 um 23:46 schrieb Caveman Al Toraboran:
>
> > so, in summary:
> > /------------------------------------------------\
> > | a 5-disk RAID10 is better than a 6-disk RAID10 |
> > | ONLY IF your data is WORTH LESS than 3,524.3 |
> > | bucks. |
> > \------------------------------------------------/
> > any thoughts? i'm a newbie. i wonder how
> > industry people think?
>
> Don't forget that having more drives increases the odds of a failing
> drive. If you have infinite drives at any given moment infinite drives
> will fail. Anyway I wouldn't know how to calculate this.

by drive, you mean a spinning hard disk?

i'm not sure how "infinite" helps here even
theoretically.  e.g. say that every year, 76% of
disks fail.  in the limit as the number of disks
approaches infinity, then 76% of infinity is
infinity.  but, how is this useful?

> Most people are limited by money and space. Even if this isn't your
> problem you will always need an additional backup strategy. The hole
> system can fail.
> I run a system with 8 drives where two can fail and they can be hot
> swoped. This is a closed source SAS which I really like except the part
> being closed source. I don't even know what kind of raid is used.
>
> The only person I know who is running a really huge raid ( I guess 2000+
> drives) is comfortable with some spare drives. His raid did fail an can
> fail. Data will be lost. Everything important has to be stored at a
> secondary location. But they are using the raid to store data for some
> days or weeks when a server is calculating stuff. If the raid fails they
> have to restart the program for the calculation.

thanks a lot.  highly appreciate these tips about
how others run their storage.

however, i am not sure what is the takeaway from
this.  e.g. your closed-source NAS vs. a large
RAID.  they don't seem to be mutually exclusive to
me (both might be on RAID).

to me, a NAS is just a computer with RAID.  no?


> Facebook used to store data which is sometimes accessed on raids. Since
> they use energy they stored data which is nearly never accessed on blue
> ray disks. I don't know if they still do. Reading is very slow if a
> mechanical arm first needs to fetch a specific blue ray out of hundreds
> and put in a disk reader but it is very energy efficient.

interesting.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 22:50     ` hitachi303
  2020-05-04  0:29       ` Caveman Al Toraboran
@ 2020-05-04  0:46       ` Rich Freeman
  2020-05-04  7:50         ` hitachi303
  1 sibling, 1 reply; 25+ messages in thread
From: Rich Freeman @ 2020-05-04  0:46 UTC (permalink / raw
  To: gentoo-user

On Sun, May 3, 2020 at 6:50 PM hitachi303
<gentoo-user@konstantinhansen.de> wrote:
>
> The only person I know who is running a really huge raid ( I guess 2000+
> drives) is comfortable with some spare drives. His raid did fail an can
> fail. Data will be lost. Everything important has to be stored at a
> secondary location. But they are using the raid to store data for some
> days or weeks when a server is calculating stuff. If the raid fails they
> have to restart the program for the calculation.

So, if you have thousands of drives, you really shouldn't be using a
conventional RAID solution.  Now, if you're just using RAID to refer
to any technology that stores data redundantly that is one thing.
However, if you wanted to stick 2000 drives into a single host using
something like mdadm/zfs, or heaven forbid a bazillion LSI HBAs with
some kind of hacked-up solution for PCIe port replication plus SATA
bus multipliers/etc, you're probably doing it wrong.  (Really even
with mdadm/zfs you probably still need some kind of terribly
non-optimal solution for attaching all those drives to a single host.)

At that scale you really should be using a distributed filesystem.  Or
you could use some application-level solution that accomplishes the
same thing on top of a bunch of more modest hosts running zfs/etc (the
Backblaze solution at least in the past).

The most mainstream FOSS solution at this scale is Ceph.  It achieves
redundancy at the host level.  That is, if you have it set up to
tolerate two failures then you can take two random hosts in the
cluster and smash their motherboards with a hammer in the middle of
operation, and the cluster will keep on working and quickly restore
its redundancy.  Each host can have multiple drives, and losing any or
all of the drives within a single host counts as a single failure.
You can even do clever stuff like tell it which hosts are attached to
which circuit breakers and then you could lose all the hosts on a
single power circuit at once and it would be fine.

This also has the benefit of covering you when one of your flakey
drives causes weird bus issues that affect other drives, or one host
crashes, and so on.  The redundancy is entirely at the host level so
you're protected against a much larger number of failure modes.

This sort of solution also performs much faster as data requests are
not CPU/NIC/HBA limited for any particular host.  The software is
obviously more complex, but the hardware can be simpler since if you
want to expand storage you just buy more servers and plug them into
the LAN, versus trying to figure out how to cram an extra dozen hard
drives into a single host with all kinds of port multiplier games.
You can also do maintenance and just reboot an entire host while the
cluster stays online as long as you aren't messing with them all at
once.

I've gone in this general direction because I was tired of having to
try to deal with massive cases, being limited to motherboards with 6
SATA ports, adding LSI HBAs that require an 8x slot and often
conflicts with using an NVMe, and so on.

-- 
Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-03 23:19     ` antlists
@ 2020-05-04  1:33       ` Caveman Al Toraboran
  0 siblings, 0 replies; 25+ messages in thread
From: Caveman Al Toraboran @ 2020-05-04  1:33 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

On Monday, May 4, 2020 3:19 AM, antlists <antlists@youngman.org.uk> wrote:

> On 03/05/2020 22:46, Caveman Al Toraboran wrote:
>
> > On Sunday, May 3, 2020 6:27 PM, Jack ostroffjh@users.sourceforge.net wrote:
> > curious. how do people look at --layout=n2 in the
> > storage industry? e.g. do they ignore the
> > optimistic case where 2 disk failures can be
> > recovered, and only assume that it protects for 1
> > disk failure?
>
> You CANNOT afford to be optimistic ... Murphy's law says you will lose
> the wrong second disk.

so i guess your answer is:  "yes, the industry
ignores the existence of optimistic cases".

if that's true, then the industry is wrong, must
learn the following:

1. don't bet that your data's survival is
   lingering on luck (you agree with this i know).

2. don't ignore statistics that reveal the fact
   that lucky cases exist.

(1) and (2) are not mutually exclusive, and
murfphy's law would suggest to not ignore (2).

becuase, if you ignore (2), you'll end up adopting
a 5-disk RAID10 instead of the superior 6-disk
RAID10 and end up being less lucky in practice.

don't rely on lucks, but why deny good luck to
come to you when it might?  --- two different
things.

> > i see why gambling is not worth it here, but at
> > the same time, i see no reason to ignore reality
> > (that a 2 disk failure can be saved).
>
> Don't ignore that some 2-disk failures CAN'T be saved ...

yeah, i'm not.  i'm just not ignoring that 2-disk
failure might get saved.

you know... it's better to have a lil window where
some good luck may chime in than banning good
luck.

> Don't forget, if you have a spare disk, the repair window is the length
> of time it takes to fail-over ...

yup.  just trying to not rely on good luck that a
spare is available.  e.g. considering for the case
that no space is there.

> > this site [2] says that 76% of seagate disks fail
> > per year (:D). and since disks fail independent
> > of each other mostly, then, the probabilty of
> > having 2 disks fail in a year is:
>
> 76% seems incredibly high. And no, disks do not fail independently of
> each other. If you buy a bunch of identical disks, at the same time, and
> stick them all in the same raid array, the chances of them all wearing
> out at the same time are rather higher than random chance would suggest.

i know.  i had this as a note, but then removed
it.  anyway, some nitpics:

1. dependence != correlation.  you mean
   correlation, not dependence.  disk failure is
   correlated if they are baught together, but
   other disks don't cause the failure (unless
   from things like heat from other disks, or
   repair stress because of other disk failing).

2. i followed the extreme case where a person got
   his disks purchased at a random time, so that
   he was maximally lucky in that his disks didn't
   synchronize.  why?

   (i) offers a better pessimistic result.
   now we know that this probability is actually
   lower than reality, which means that we know
   that the 3.5k bucks is actually even lower.
   this should scare us more (hence us relying on
   less luck).

   (ii) makes calculation easier.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-04  0:29       ` Caveman Al Toraboran
@ 2020-05-04  7:50         ` hitachi303
  0 siblings, 0 replies; 25+ messages in thread
From: hitachi303 @ 2020-05-04  7:50 UTC (permalink / raw
  To: gentoo-user

Am 04.05.2020 um 02:29 schrieb Caveman Al Toraboran:
>> Facebook used to store data which is sometimes accessed on raids. Since
>> they use energy they stored data which is nearly never accessed on blue
>> ray disks. I don't know if they still do. Reading is very slow if a
>> mechanical arm first needs to fetch a specific blue ray out of hundreds
>> and put in a disk reader but it is very energy efficient.
> interesting.

A video from 2014
https://www.facebook.com/Engineering/videos/10152128660097200/



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-04  0:46       ` Rich Freeman
@ 2020-05-04  7:50         ` hitachi303
  2020-05-04  8:18           ` William Kenworthy
  0 siblings, 1 reply; 25+ messages in thread
From: hitachi303 @ 2020-05-04  7:50 UTC (permalink / raw
  To: gentoo-user

Am 04.05.2020 um 02:46 schrieb Rich Freeman:
> On Sun, May 3, 2020 at 6:50 PM hitachi303
> <gentoo-user@konstantinhansen.de> wrote:
>>
>> The only person I know who is running a really huge raid ( I guess 2000+
>> drives) is comfortable with some spare drives. His raid did fail an can
>> fail. Data will be lost. Everything important has to be stored at a
>> secondary location. But they are using the raid to store data for some
>> days or weeks when a server is calculating stuff. If the raid fails they
>> have to restart the program for the calculation.
> 
> So, if you have thousands of drives, you really shouldn't be using a
> conventional RAID solution.  Now, if you're just using RAID to refer
> to any technology that stores data redundantly that is one thing.
> However, if you wanted to stick 2000 drives into a single host using
> something like mdadm/zfs, or heaven forbid a bazillion LSI HBAs with
> some kind of hacked-up solution for PCIe port replication plus SATA
> bus multipliers/etc, you're probably doing it wrong.  (Really even
> with mdadm/zfs you probably still need some kind of terribly
> non-optimal solution for attaching all those drives to a single host.)
> 
> At that scale you really should be using a distributed filesystem.  Or
> you could use some application-level solution that accomplishes the
> same thing on top of a bunch of more modest hosts running zfs/etc (the
> Backblaze solution at least in the past).
> 
> The most mainstream FOSS solution at this scale is Ceph.  It achieves
> redundancy at the host level.  That is, if you have it set up to
> tolerate two failures then you can take two random hosts in the
> cluster and smash their motherboards with a hammer in the middle of
> operation, and the cluster will keep on working and quickly restore
> its redundancy.  Each host can have multiple drives, and losing any or
> all of the drives within a single host counts as a single failure.
> You can even do clever stuff like tell it which hosts are attached to
> which circuit breakers and then you could lose all the hosts on a
> single power circuit at once and it would be fine.
> 
> This also has the benefit of covering you when one of your flakey
> drives causes weird bus issues that affect other drives, or one host
> crashes, and so on.  The redundancy is entirely at the host level so
> you're protected against a much larger number of failure modes.
> 
> This sort of solution also performs much faster as data requests are
> not CPU/NIC/HBA limited for any particular host.  The software is
> obviously more complex, but the hardware can be simpler since if you
> want to expand storage you just buy more servers and plug them into
> the LAN, versus trying to figure out how to cram an extra dozen hard
> drives into a single host with all kinds of port multiplier games.
> You can also do maintenance and just reboot an entire host while the
> cluster stays online as long as you aren't messing with them all at
> once.
> 
> I've gone in this general direction because I was tired of having to
> try to deal with massive cases, being limited to motherboards with 6
> SATA ports, adding LSI HBAs that require an 8x slot and often
> conflicts with using an NVMe, and so on.


So you are right. This is the way they do it. I used the term raid to 
broadly.
But still they have problems with limitations. Size of room, what air 
conditioning can handle and stuff like this.

Anyway I only wanted to point out that there are different approaches in 
the industries and saving the data at any price is not always necessary.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] which linux RAID setup to choose?
  2020-05-04  7:50         ` hitachi303
@ 2020-05-04  8:18           ` William Kenworthy
  0 siblings, 0 replies; 25+ messages in thread
From: William Kenworthy @ 2020-05-04  8:18 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1604 bytes --]

On 4/5/20 3:50 pm, hitachi303 wrote:
> Am 04.05.2020 um 02:46 schrieb Rich Freeman:
>> On Sun, May 3, 2020 at 6:50 PM hitachi303
>> <gentoo-user@konstantinhansen.de> wrote: 

>> ...
> So you are right. This is the way they do it. I used the term raid to
> broadly.
> But still they have problems with limitations. Size of room, what air
> conditioning can handle and stuff like this.
>
> Anyway I only wanted to point out that there are different approaches
> in the industries and saving the data at any price is not always
> necessary.
>
I would suggest that once you go past a few drives there are better ways.

I had two 4 disk, bcache fronted raid 10's on two PC'cs with critical
data backed up between them.  When an ssd bcache failed in one, and two
backing stores in the other almost simultaneously I nearly had to resort
to offline backups to restore the data ... downtime was still a major pain.

I now have a 5 x cheap arm systems and a small x86 master with 7 disks
across them - response time is good, power use seems less (being much
cooler, quieter) than running two over the top older PC's.  The 
reliability/recovery time (at least when I tested by manually failing
drives and causing power outages) is much better.

I am using moosefs, but lizardfs looks similar and both can offer
erasure coding which gives more storage space still with recovery if you
have enough disks.

Downside is maintaining more systems, more complex networking and the
like - Its been a few months now, and I wont be going back to raid for
my main storage.

BillK

[-- Attachment #2: pEpkey.asc --]
[-- Type: application/pgp-keys, Size: 2225 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-05-04  8:18 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-03  5:44 [gentoo-user] which linux RAID setup to choose? Caveman Al Toraboran
2020-05-03  7:53 ` hitachi303
2020-05-03  9:23   ` Wols Lists
2020-05-03 17:55     ` Caveman Al Toraboran
2020-05-03 18:04       ` Dale
2020-05-03 18:29       ` Mark Knecht
2020-05-03 20:16         ` Rich Freeman
2020-05-03 22:52           ` Mark Knecht
2020-05-03 23:23             ` Rich Freeman
2020-05-03 21:22       ` antlists
2020-05-03  9:14 ` Wols Lists
2020-05-03  9:21   ` Caveman Al Toraboran
2020-05-03 14:27 ` Jack
2020-05-03 21:46   ` Caveman Al Toraboran
2020-05-03 22:50     ` hitachi303
2020-05-04  0:29       ` Caveman Al Toraboran
2020-05-04  7:50         ` hitachi303
2020-05-04  0:46       ` Rich Freeman
2020-05-04  7:50         ` hitachi303
2020-05-04  8:18           ` William Kenworthy
2020-05-03 23:19     ` antlists
2020-05-04  1:33       ` Caveman Al Toraboran
2020-05-03 20:07 ` Rich Freeman
2020-05-03 21:32   ` antlists
2020-05-03 22:34     ` Rich Freeman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox