From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 6078B138350 for ; Sun, 3 May 2020 23:19:48 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 0CD2AE0966; Sun, 3 May 2020 23:19:41 +0000 (UTC) Received: from smtp.hosts.co.uk (smtp.hosts.co.uk [85.233.160.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id A50D5E07A7 for ; Sun, 3 May 2020 23:19:40 +0000 (UTC) Received: from [81.153.126.158] (helo=[192.168.1.225]) by smtp.hosts.co.uk with esmtpa (Exim) (envelope-from ) id 1jVNta-00081S-DC for gentoo-user@lists.gentoo.org; Mon, 04 May 2020 00:19:38 +0100 Subject: Re: [gentoo-user] which linux RAID setup to choose? To: gentoo-user@lists.gentoo.org References: <2251dac1-92cd-7c3b-97ea-6a061fe01eb0@users.sourceforge.net> From: antlists Message-ID: Date: Mon, 4 May 2020 00:19:37 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit X-Archives-Salt: 663f1c38-eb89-42b1-9ab9-6593b97edf15 X-Archives-Hash: 43c5dfb30839045192a0415979109f0a On 03/05/2020 22:46, Caveman Al Toraboran wrote: > On Sunday, May 3, 2020 6:27 PM, Jack wrote: > > > curious. how do people look at --layout=n2 in the > storage industry? e.g. do they ignore the > optimistic case where 2 disk failures can be > recovered, and only assume that it protects for 1 > disk failure? You CANNOT afford to be optimistic ... Murphy's law says you will lose the wrong second disk. > > i see why gambling is not worth it here, but at > the same time, i see no reason to ignore reality > (that a 2 disk failure can be saved). > Don't ignore that some 2-disk failures CAN'T be saved ... > e.g. a 4-disk RAID10 with -layout=n2 gives > > 1*4/10 + 2*4/10 = 1.2 > > expected recoverable disk failures. details are > below: > > > now, if we do a 5-disk --layout=n2, we get: > > 1 (1) 2 (2) 3 > (3) 4 (4) 5 (5) > 6 (6) 7 (7) 8 > (8) 9 (9) 10 (10) > 11 (11) 12 (12) 13 > (13) ... > > obviously, there are 5 possible ways a single disk > may fail, out of which all of the 5 will be > recovered. Don't forget a 4+spare layout, which *should* survive a 2-disk failure. > > there are nchoosek(5,2) = 10 possible ways a 2 > disk failure could happen, out of which 5 > will be recovered: > > > so, by transforming a 4-disk RAID10 into a 5-disk > one, we increase total storage capacity by a 0.5 > disk's worth of storage, while losing the ability > to recover 0.2 disks. > > but if we extended the 4-disk RAID10 into a > 6-disk --layout=n2, we will have: > > 6 nchoosek(6,2) - 3 > = 1 * ----------------- + 2 * ----------------- > 6 + nchoosek(6,2) 6 + nchoosek(6,2) > > = 6/21 + 2 * 12/15 > > = 1.8857 expected recoverable failing disks. > > almost 2. i.e. there is 80% chance of surviving a > 2 disk failure. > > so, i wonder, is it a bad decision to go with an > even number disks with a RAID10? what is the > right way to think to find an answer to this > question? > > i guess the ultimate answer needs knowledge of > these: > > * F1: probability of having 1 disks fail within > the repair window. > * F2: probability of having 2 disks fail within > the repair window. > * F3: probability of having 3 disks fail within > . the repair window. > . > . > * Fn: probability of having n disks fail within > the repair window. > > * R1: probability of surviving 1 disks failure. > equals 1 with all related cases. > * R2: probability of surviving 2 disks failure. > equals 1/3 with 5-disk RAID10 > equals 0.8 with a 6-disk RAID10. > * R3: probability of surviving 3 disks failure. > equals 0 with all related cases. > . > . > . > * Rn: probability of surviving n disks failure. > equals 0 with all related cases. > > * L : expected cost of losing data on an array. > * D : price of a disk. Don't forget, if you have a spare disk, the repair window is the length of time it takes to fail-over ... > > this way, the absolute expected cost when adopting > a 6-disk RAID10 is: > > = 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ... > = 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ... > = 6D + 0 + F2*(0.2)*L + F3*(1-0)*L + ... > > and the absolute cost for a 5-disk RAID10 is: > > = 5D + F1*(1-1)*L + F2*(1-0.3333)*L + F3*(1-0)*L + ... > = 5D + 0 + F2*(0.6667)*L + F3*(1-0)*L + ... > > canceling identical terms, the difference cost is: > > 6-disk ===> 6D + 0.2*F2*L > 5-disk ===> 5D + 0.6667*F2*L > > from here [1] we know that a 1TB disk costs > $35.85, so: > > 6-disk ===> 6*35.85 + 0.2*F2*L > 5-disk ===> 5*35.85 + 0.6667*F2*L > > now, at which point is a 5-disk array a better > economical decision than a 6-disk one? for > simplicity, let LOL = F2*L: > > 5*35.85 + 0.6667 * LOL < 6*35.85 + 0.2 * LOL > 0.6667*LOL - 0.2 * LOL < 6*35.85 - 5*35.85 > LOL * (0.6667 - 0.2) < 6*35.85 - 5*35.85 > > 6*35.85 - 5*35.85 > LOL < ----------------- > 0.6667 - 0.2 > > LOL < 76.816 > F2*L < 76.816 > > so, a 5-disk RAID10 is better than a 6-disk RAID10 > only if: > > F2*L < 76.816 bucks. > > this site [2] says that 76% of seagate disks fail > per year (:D). and since disks fail independent > of each other mostly, then, the probabilty of > having 2 disks fail in a year is: > 76% seems incredibly high. And no, disks do not fail independently of each other. If you buy a bunch of identical disks, at the same time, and stick them all in the same raid array, the chances of them all wearing out at the same time are rather higher than random chance would suggest. Which is why, if a raid disk fails, the advice is always to replace it asap. And if possible, to recover the failed drive to try and copy that rather than hammer the rest of the raid. Bear in mind that, it doesn't matter how many drives a raid-10 has, if you're recovering on to a new drive, the data is stored on just two of the other drives. So the chances of them failing as they get hammered are a lot higher. That's why it makes a lot of sense to make sure you monitor the SMARTs, so you can replace any of the drives that look like failing before they actually do. And check the warranties. Expensive raid drives probably have longer warranties, so when they're out of warranty consider retiring them (they'll probably last a lot longer, but it's a judgement call). All that said, I've been running a raid-1 mirror for a good few years, and I've not had any trouble on my Barracudas. Cheers, Wol