From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gentoo-user+bounces-191046-garchives=archives.gentoo.org@lists.gentoo.org>
Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by finch.gentoo.org (Postfix) with ESMTPS id DCB48138350
	for <garchives@archives.gentoo.org>; Sun,  3 May 2020 21:46:23 +0000 (UTC)
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id 65171E09E2;
	Sun,  3 May 2020 21:46:16 +0000 (UTC)
Received: from mail-40132.protonmail.ch (mail-40132.protonmail.ch [185.70.40.132])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by pigeon.gentoo.org (Postfix) with ESMTPS id B2110E09D6
	for <gentoo-user@lists.gentoo.org>; Sun,  3 May 2020 21:46:15 +0000 (UTC)
Date: Sun, 03 May 2020 21:46:07 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com;
	s=protonmail; t=1588542372;
	bh=P2ABhdMRDoc4TQofGT6ArAo7pCyv6Gpd8yt17LjoxXQ=;
	h=Date:To:From:Reply-To:Subject:In-Reply-To:References:From;
	b=L+78lDih+Fht88cR7zPz3hMRz/nxp3c16Oz71Isr/yFu1q/p+jvtNp8NLKWI9ziv5
	 UJTQvL6fh9GJPExcoYtfcJrM0paXJ2AkPWbslLSzjeWwFZSdX7uqkv6Z01UqKiDJ+W
	 OVUGl+R2QNSO1Z/d6LegJq0c/Lsbc1isMKtyVNgs=
To: "gentoo-user@lists.gentoo.org" <gentoo-user@lists.gentoo.org>
From: Caveman Al Toraboran <toraboracaveman@protonmail.com>
Subject: Re: [gentoo-user] which linux RAID setup to choose?
Message-ID: <r_Q9jvM58EU2pwZlP_Y-68RWGty_14cd-2tWbp0SkzuYCp_NjKveJ5N2u_C7-MDj_ECdnP7ITK-fEikxX-u-j9qZkc8K6zMSUerYoduMq5c=@protonmail.com>
In-Reply-To: <2251dac1-92cd-7c3b-97ea-6a061fe01eb0@users.sourceforge.net>
References: <dri0tBrXDazCGtc_Eu0IwV0R1chgd2giA9ZqGEs8LOJa3vAwAreuXaIR2MeyOgAfXi51yqLcR5NpxDSFY5ss1igKxRAM50hSu7mXY0Y-I78=@protonmail.com>
 <2251dac1-92cd-7c3b-97ea-6a061fe01eb0@users.sourceforge.net>
Precedence: bulk
List-Post: <mailto:gentoo-user@lists.gentoo.org>
List-Help: <mailto:gentoo-user+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-user+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-user+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-user.gentoo.org>
X-BeenThere: gentoo-user@lists.gentoo.org
Reply-to: gentoo-user@lists.gentoo.org
X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-1.2 required=7.0 tests=ALL_TRUSTED,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM shortcircuit=no
	autolearn=disabled version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on mail.protonmail.ch
X-Archives-Salt: ab0583b4-fc68-4f0c-9dfd-9ce1cf37c72a
X-Archives-Hash: 8e0baeea070e198e3ca9957c72381b1e

On Sunday, May 3, 2020 6:27 PM, Jack <ostroffjh@users.sourceforge.net> wrot=
e:

> Minor point - you have one duplicate line there ". f=C2=A0 f ." which is =
the
> second and last line of the second group.=C2=A0 No effect on anything els=
e in
> the discussion.

thanks.

> Trying to help thinking about odd numbers of disks, if you are still
> allowing only one disk to fail, then you can think about mirroring half
> disks, so each disk has half of it mirrored to a different disk, instead
> of drives always being mirrored in pairs.

that definitely helped get me unstuck and continue
thinking.  thanks.

curious.  how do people look at --layout=3Dn2 in the
storage industry?  e.g. do they ignore the
optimistic case where 2 disk failures can be
recovered, and only assume that it protects for 1
disk failure?

i see why gambling is not worth it here, but at
the same time, i see no reason to ignore reality
(that a 2 disk failure can be saved).

e.g. a 4-disk RAID10 with -layout=3Dn2 gives

        1*4/10 + 2*4/10 =3D 1.2

expected recoverable disk failures.  details are
below:

  F   .       .   .       < recoverable
  .   F       .   .       < cases with
  .   .       F   .       < 1 disk
  .   .       .   F       < failure

  F   .       .   F       < recoverable
  .   F       F   .       < cases with
  .   F       .   F       < 2 disk
  F   .       F   .       < failures

  F   F       .   .       < not recoverable
  .   .       F   F       < cases with 2 disk
                          < failures

now, if we do a 5-disk --layout=3Dn2, we get:

    1    (1)    2    (2)    3
   (3)    4    (4)    5    (5)
    6    (6)    7    (7)    8
   (8)    9    (9)    10   (10)
    11   (11)   12   (12)   13
   (13) ...

obviously, there are 5 possible ways a single disk
may fail, out of which all of the 5 will be
recovered.

there are nchoosek(5,2) =3D 10 possible ways a 2
disk failure could happen, out of which 5
will be recovered:

   xxx   (1)   xxx   (2)    3
   xxx    4    xxx    5    (5)

   xxx   (1)    2    xxx    3
   xxx    4    (4)   xxx   (5)


    1    xxx    2    xxx    3
   (3)   xxx   (4)   xxx   (5)

    1    xxx    2    (2)   xxx
   (3)   xxx   (4)    5    xxx


    1    (1)   xxx   (2)   xxx
   (3)    4    xxx    5    xxx

so, expected recoverable disk failures for a
5-disk RAID10 --layout=3Dn2 is:

        1*5/15 + 2*5/15 =3D 1

so, by transforming a 4-disk RAID10 into a 5-disk
one, we increase total storage capacity by a 0.5
disk's worth of storage, while losing the ability
to recover 0.2 disks.

but if we extended the 4-disk RAID10 into a
6-disk --layout=3Dn2, we will have:

             6                  nchoosek(6,2) - 3
=3D 1 * -----------------  +  2 * -----------------
      6 + nchoosek(6,2)         6 + nchoosek(6,2)

=3D 6/21                   +  2 * 12/15

=3D 1.8857 expected recoverable failing disks.

almost 2.  i.e. there is 80% chance of surviving a
2 disk failure.

so, i wonder, is it a bad decision to go with an
even number disks with a RAID10?  what is the
right way to think to find an answer to this
question?

i guess the ultimate answer needs knowledge of
these:

    * F1: probability of having 1 disks fail within
          the repair window.
    * F2: probability of having 2 disks fail within
          the repair window.
    * F3: probability of having 3 disks fail within
      .   the repair window.
      .
      .
    * Fn: probability of having n disks fail within
          the repair window.

    * R1: probability of surviving 1 disks failure.
          equals 1 with all related cases.
    * R2: probability of surviving 2 disks failure.
          equals 1/3 with 5-disk RAID10
          equals 0.8 with a 6-disk RAID10.
    * R3: probability of surviving 3 disks failure.
          equals 0 with all related cases.
      .
      .
      .
    * Rn: probability of surviving n disks failure.
          equals 0 with all related cases.

    * L : expected cost of losing data on an array.
    * D : price of a disk.

this way, the absolute expected cost when adopting
a 6-disk RAID10 is:

=3D 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ...
=3D 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ...
=3D 6D + 0          + F2*(0.2)*L   + F3*(1-0)*L + ...

and the absolute cost for a 5-disk RAID10 is:

=3D 5D + F1*(1-1)*L + F2*(1-0.3333)*L + F3*(1-0)*L + ...
=3D 5D + 0          + F2*(0.6667)*L   + F3*(1-0)*L + ...

canceling identical terms, the difference cost is:

6-disk =3D=3D=3D> 6D + 0.2*F2*L
5-disk =3D=3D=3D> 5D + 0.6667*F2*L

from here [1] we know that a 1TB disk costs
$35.85, so:

6-disk =3D=3D=3D> 6*35.85 + 0.2*F2*L
5-disk =3D=3D=3D> 5*35.85 + 0.6667*F2*L

now, at which point is a 5-disk array a better
economical decision than a 6-disk one?  for
simplicity, let LOL =3D F2*L:

5*35.85 + 0.6667 * LOL  <   6*35.85 + 0.2 * LOL
0.6667*LOL - 0.2 * LOL  <   6*35.85 - 5*35.85
LOL * (0.6667 - 0.2)    <   6*35.85 - 5*35.85

                            6*35.85 - 5*35.85
           LOL          <   -----------------
                              0.6667 - 0.2

           LOL          <   76.816
           F2*L         <   76.816

so, a 5-disk RAID10 is better than a 6-disk RAID10
only if:

        F2*L  <  76.816 bucks.

this site [2] says that 76% of seagate disks fail
per year (:D).  and since disks fail independent
of each other mostly, then, the probabilty of
having 2 disks fail in a year is:

        F2_year =3D 0.76*0.76
                =3D 0.5776

but what is F2_week?  each year has 52.1429 weeks.
let's be generous and assume that disks fail at a
uniform distribution across the year (e.g. suppose
that we bought them randomlyly, and not in a
single batch).

in this case, the probability of 2 disks failing
in the same week (suppose that our repair window
is 1 week):

                          52
    F2 =3D 0.5776 * --------------------
                 52 + nchoosek(52, 2)

       =3D 0.5776 * 0.037736
       =3D 0.021796

let's substitute a bit:

        F2 * L  <  76.816  bucks.
  0.021796 * L  <  76.816  bucks.
             L  <  76.816 / 0.021796  bucks.
             L  <  3524.3  bucks.

so, in summary:

 /------------------------------------------------\
 | a 5-disk RAID10 is better than a 6-disk RAID10 |
 | ONLY IF your data is WORTH LESS than 3,524.3   |
 | bucks.                                         |
 \------------------------------------------------/

any thoughts?  i'm a newbie.  i wonder how
industry people think?

happy quarantine,
cm

------------
[1] https://www.amazon.com/WD-Blue-1TB-Hard-Drive/dp/B0088PUEPK/
[2] https://www.seagate.com/em/en/support/kb/hard-disk-drive-reliability-an=
d-mtbf-afr-174791en/