From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id DCB48138350 for ; Sun, 3 May 2020 21:46:23 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 65171E09E2; Sun, 3 May 2020 21:46:16 +0000 (UTC) Received: from mail-40132.protonmail.ch (mail-40132.protonmail.ch [185.70.40.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id B2110E09D6 for ; Sun, 3 May 2020 21:46:15 +0000 (UTC) Date: Sun, 03 May 2020 21:46:07 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail; t=1588542372; bh=P2ABhdMRDoc4TQofGT6ArAo7pCyv6Gpd8yt17LjoxXQ=; h=Date:To:From:Reply-To:Subject:In-Reply-To:References:From; b=L+78lDih+Fht88cR7zPz3hMRz/nxp3c16Oz71Isr/yFu1q/p+jvtNp8NLKWI9ziv5 UJTQvL6fh9GJPExcoYtfcJrM0paXJ2AkPWbslLSzjeWwFZSdX7uqkv6Z01UqKiDJ+W OVUGl+R2QNSO1Z/d6LegJq0c/Lsbc1isMKtyVNgs= To: "gentoo-user@lists.gentoo.org" From: Caveman Al Toraboran Subject: Re: [gentoo-user] which linux RAID setup to choose? Message-ID: In-Reply-To: <2251dac1-92cd-7c3b-97ea-6a061fe01eb0@users.sourceforge.net> References: <2251dac1-92cd-7c3b-97ea-6a061fe01eb0@users.sourceforge.net> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.2 required=7.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM shortcircuit=no autolearn=disabled version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on mail.protonmail.ch X-Archives-Salt: ab0583b4-fc68-4f0c-9dfd-9ce1cf37c72a X-Archives-Hash: 8e0baeea070e198e3ca9957c72381b1e On Sunday, May 3, 2020 6:27 PM, Jack wrot= e: > Minor point - you have one duplicate line there ". f=C2=A0 f ." which is = the > second and last line of the second group.=C2=A0 No effect on anything els= e in > the discussion. thanks. > Trying to help thinking about odd numbers of disks, if you are still > allowing only one disk to fail, then you can think about mirroring half > disks, so each disk has half of it mirrored to a different disk, instead > of drives always being mirrored in pairs. that definitely helped get me unstuck and continue thinking. thanks. curious. how do people look at --layout=3Dn2 in the storage industry? e.g. do they ignore the optimistic case where 2 disk failures can be recovered, and only assume that it protects for 1 disk failure? i see why gambling is not worth it here, but at the same time, i see no reason to ignore reality (that a 2 disk failure can be saved). e.g. a 4-disk RAID10 with -layout=3Dn2 gives 1*4/10 + 2*4/10 =3D 1.2 expected recoverable disk failures. details are below: F . . . < recoverable . F . . < cases with . . F . < 1 disk . . . F < failure F . . F < recoverable . F F . < cases with . F . F < 2 disk F . F . < failures F F . . < not recoverable . . F F < cases with 2 disk < failures now, if we do a 5-disk --layout=3Dn2, we get: 1 (1) 2 (2) 3 (3) 4 (4) 5 (5) 6 (6) 7 (7) 8 (8) 9 (9) 10 (10) 11 (11) 12 (12) 13 (13) ... obviously, there are 5 possible ways a single disk may fail, out of which all of the 5 will be recovered. there are nchoosek(5,2) =3D 10 possible ways a 2 disk failure could happen, out of which 5 will be recovered: xxx (1) xxx (2) 3 xxx 4 xxx 5 (5) xxx (1) 2 xxx 3 xxx 4 (4) xxx (5) 1 xxx 2 xxx 3 (3) xxx (4) xxx (5) 1 xxx 2 (2) xxx (3) xxx (4) 5 xxx 1 (1) xxx (2) xxx (3) 4 xxx 5 xxx so, expected recoverable disk failures for a 5-disk RAID10 --layout=3Dn2 is: 1*5/15 + 2*5/15 =3D 1 so, by transforming a 4-disk RAID10 into a 5-disk one, we increase total storage capacity by a 0.5 disk's worth of storage, while losing the ability to recover 0.2 disks. but if we extended the 4-disk RAID10 into a 6-disk --layout=3Dn2, we will have: 6 nchoosek(6,2) - 3 =3D 1 * ----------------- + 2 * ----------------- 6 + nchoosek(6,2) 6 + nchoosek(6,2) =3D 6/21 + 2 * 12/15 =3D 1.8857 expected recoverable failing disks. almost 2. i.e. there is 80% chance of surviving a 2 disk failure. so, i wonder, is it a bad decision to go with an even number disks with a RAID10? what is the right way to think to find an answer to this question? i guess the ultimate answer needs knowledge of these: * F1: probability of having 1 disks fail within the repair window. * F2: probability of having 2 disks fail within the repair window. * F3: probability of having 3 disks fail within . the repair window. . . * Fn: probability of having n disks fail within the repair window. * R1: probability of surviving 1 disks failure. equals 1 with all related cases. * R2: probability of surviving 2 disks failure. equals 1/3 with 5-disk RAID10 equals 0.8 with a 6-disk RAID10. * R3: probability of surviving 3 disks failure. equals 0 with all related cases. . . . * Rn: probability of surviving n disks failure. equals 0 with all related cases. * L : expected cost of losing data on an array. * D : price of a disk. this way, the absolute expected cost when adopting a 6-disk RAID10 is: =3D 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ... =3D 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ... =3D 6D + 0 + F2*(0.2)*L + F3*(1-0)*L + ... and the absolute cost for a 5-disk RAID10 is: =3D 5D + F1*(1-1)*L + F2*(1-0.3333)*L + F3*(1-0)*L + ... =3D 5D + 0 + F2*(0.6667)*L + F3*(1-0)*L + ... canceling identical terms, the difference cost is: 6-disk =3D=3D=3D> 6D + 0.2*F2*L 5-disk =3D=3D=3D> 5D + 0.6667*F2*L from here [1] we know that a 1TB disk costs $35.85, so: 6-disk =3D=3D=3D> 6*35.85 + 0.2*F2*L 5-disk =3D=3D=3D> 5*35.85 + 0.6667*F2*L now, at which point is a 5-disk array a better economical decision than a 6-disk one? for simplicity, let LOL =3D F2*L: 5*35.85 + 0.6667 * LOL < 6*35.85 + 0.2 * LOL 0.6667*LOL - 0.2 * LOL < 6*35.85 - 5*35.85 LOL * (0.6667 - 0.2) < 6*35.85 - 5*35.85 6*35.85 - 5*35.85 LOL < ----------------- 0.6667 - 0.2 LOL < 76.816 F2*L < 76.816 so, a 5-disk RAID10 is better than a 6-disk RAID10 only if: F2*L < 76.816 bucks. this site [2] says that 76% of seagate disks fail per year (:D). and since disks fail independent of each other mostly, then, the probabilty of having 2 disks fail in a year is: F2_year =3D 0.76*0.76 =3D 0.5776 but what is F2_week? each year has 52.1429 weeks. let's be generous and assume that disks fail at a uniform distribution across the year (e.g. suppose that we bought them randomlyly, and not in a single batch). in this case, the probability of 2 disks failing in the same week (suppose that our repair window is 1 week): 52 F2 =3D 0.5776 * -------------------- 52 + nchoosek(52, 2) =3D 0.5776 * 0.037736 =3D 0.021796 let's substitute a bit: F2 * L < 76.816 bucks. 0.021796 * L < 76.816 bucks. L < 76.816 / 0.021796 bucks. L < 3524.3 bucks. so, in summary: /------------------------------------------------\ | a 5-disk RAID10 is better than a 6-disk RAID10 | | ONLY IF your data is WORTH LESS than 3,524.3 | | bucks. | \------------------------------------------------/ any thoughts? i'm a newbie. i wonder how industry people think? happy quarantine, cm ------------ [1] https://www.amazon.com/WD-Blue-1TB-Hard-Drive/dp/B0088PUEPK/ [2] https://www.seagate.com/em/en/support/kb/hard-disk-drive-reliability-an= d-mtbf-afr-174791en/