From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id A4D2C1381F3 for ; Sat, 22 Jun 2013 10:30:31 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 1E16EE0AE8; Sat, 22 Jun 2013 10:30:22 +0000 (UTC) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 26B04E0AA2 for ; Sat, 22 Jun 2013 10:30:21 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1UqL5F-0004CD-JA for gentoo-amd64@lists.gentoo.org; Sat, 22 Jun 2013 12:30:17 +0200 Received: from ip68-231-22-224.ph.ph.cox.net ([68.231.22.224]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 22 Jun 2013 12:30:17 +0200 Received: from 1i5t5.duncan by ip68-231-22-224.ph.ph.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 22 Jun 2013 12:30:17 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: gentoo-amd64@lists.gentoo.org From: Duncan <1i5t5.duncan@cox.net> Subject: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value? Date: Sat, 22 Jun 2013 10:29:59 +0000 (UTC) Message-ID: References: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-amd64@lists.gentoo.org Reply-to: gentoo-amd64@lists.gentoo.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: ip68-231-22-224.ph.ph.cox.net User-Agent: Pan/0.140 (Chocolate Salty Balls; GIT 459f52e /usr/src/portage/src/egit-src/pan2) X-Archives-Salt: 59b2e017-9dda-4c2f-a3da-73bb9efbea51 X-Archives-Hash: 0f83d4749396f82e23dcee4d70780fb4 Rich Freeman posted on Fri, 21 Jun 2013 11:13:51 -0400 as excerpted: > On Fri, Jun 21, 2013 at 10:27 AM, Duncan <1i5t5.duncan@cox.net> wrote: >> Question: Would you use [btrfs] for raid1 yet, as I'm doing? >> What about as a single-device filesystem? > If I wanted to use raid1 I might consider using btrfs now. I think it > is still a bit risky, but the established use cases have gotten a fair > bit of testing now. I'd be more confident in using it with a single > device. OK, so we agree on the basic confidence level of various btrfs features. I trust my own judgement a bit more now. =:^) > To migrate today would require finding someplace to dump all > the data offline and migrate the drives, as there is no in-place way to > migrate multiple ext3/4 logical volumes on top of mdadm to a single > btrfs on bare metal. ... Unless you have enough unpartitioned space available still. What I did a few years ago is buy a 1 TB USB drive I found at a good deal. (It was very near the price of half-TB drives at the time, I figured out later they must have gotten shipped a pallet of the wrong ones for a sale on the half-TB version of the same thing, so it was a single-store, get-it-while-they're-there-to-get, deal.) That's how I was able to migrate from the raid6 I had back to raid1. I had to squeeze the data/partitions a bit to get everything to fit, but it did, and that's how I ended up with 4-way raid1, since it /had/ been a 4- way raid6. All 300-gig drives at the time, so the TB USB had /plenty/ of room. =:^) > Without replying to anything in particular both you and Bob have > mentioned the importance of multiple redundancy. > > Obviously risk goes down as redundancy goes up. If you protect 25 > drives of data with 1 drive of parity then you need 2/26 drives to fail > to hose 25 drives of data. Ouch! > If you protect 1 drive of data with 25 drives of parity (call them > mirrors or parity or whatever - they're functionally equivalent) then > you need 25/26 drives to fail to lose 1 drive of data. Almost correct. Except that with 25/26 failed, you'd still have 1 working, which with raid1/mirroring would be enough. (AFAIK that's the difference with parity. Parity is generally done on a minimum of two devices with the third as parity, and going down to just one isn't enough, you can lose only one, or two if you have two-way-parity as with raid6. With mirroring/raid1, they're all essentially identical, so one is enough to keep going, you'd have to loose 26/26 to be dead in the water. But 25/26 dead or 26/26 dead, you better HOPE it never comes down to where that matters!) > RAID 1 is actually less effective - if you protect 13 > drives of data with 13 mirrors you need 2/26 drives to fail to lose 1 > drive of data (they just have to be the wrong 2). However, you do need > to consider that RAID is not the only way to protect data, and I'm not > sure that multiple-redundancy raid-1 is the most cost-effective > strategy. The first time I read that thru I read it wrong, and was about to disagree. Then I realized what you meant... and that it was an equally valid read of what you wrote, except... AFAIK 13 drives of data with 13 mirrors wouldn't (normally) be called raid1 (unless it's 13 individual raid1s). Normally, an arrangement of that nature if configured together would be configured as raid10, 2-way- mirrored, 13-way-striped (or possibly raid0+1, but that's not recommended for technical reasons having to do with rebuild thruput), tho it could also be configured as what mdraid calls linear mode (which isn't really raid, but it happens to be handled by the same md/raid driver in Linux) across the 13, plus raid1, or if they're configured as separate volumes, 13 individual two-disk raid1s, any of which might be what you meant (and the wording appears to favor 13 individual raid1s). What I interpreted it as initially was a 13-way raid1, mirrored again at a second level to 13 additional drives, which would be called raid11, except that there's no benefit of that over a simple single-layer 26-way raid1 so the raid11 term is seldom seen, and that's clearly not what you meant. Anyway, you're correct if it's just two-way-mirrored. However, at that level, if one was to do only two-way-mirroring, one would usually do either raid10 for the 13-way striping, or 13 separate raid1s, which would give one the opportunity to make some of them 3-way-mirrored (or more) raid1s for the really vital data, leaving the less vital data as simple 2-way-mirror-raid1s. Or raid6 and get loss-of-two tolerance, but as this whole subthread is discussing, that can be problematic for thruput. (I've occasionally seen reference to raid7, which is said to be 3-way-parity, loss-of-three- tolerance, but AFAIK there's no support for it in the kernel, and I wouldn't be surprised if all implementations are proprietary. AFAIK, in practice, raid10 with N-way mirroring on the raid1 portion is implemented once that many devices get involved, or other multi-level raid schemes.) > If I had 2 drives of data to protect and had 4 spare drives to do it > with, I doubt I'd set up a 3x raid-1/5/10 setup (or whatever you want to > call it - imho raid "levels" are poorly named as there really is just > striping and mirroring and adding RS parity and everything else is just > combinations). Instead I'd probably set up a RAID1/5/10/whatever with > single redundancy for faster storage and recovery, and an offline backup > (compressed and with incrementals/etc). The backup gets you more > security and you only need it in a very unlikely double-failure. I'd > only invest in multiple redundancy in the event that the risk-weighted > cost of having the node go down exceeds the cost of the extra drives. > Frankly in that case RAID still isn't the right solution - you need a > backup node someplace else entirely as hard drives aren't the only thing > that can break in your server. So we're talking six drives, two of data and four "spares" to play with. Often that's setup as raid10, either two-way-striped and 3-way-mirrored, or 3-way-striped and 2-way-mirrored, depending on whether the loss-of-two tolerance of 3-way-mirroring or thruput of three-way-striping, is considered of higher value. You're right that at that level, you DO need a real backup, and it should take priority over raid-whatever. HOWEVER, in addition to creating a SINGLE raid across all those drives, it's possible to partition them up, and create multiple raids out of the partitions, with one set being a backup of the other. And since you've already stated that there's only two drives worth of data, there's certainly room enough amongst the six drives total to do just that. This is in fact how I ran my raids, both my raid6 config, and my raid1 config, for a number of years, and is in fact how I have my (raid1-mode) btrfs filesystems setup now on the SSDs. Effectively I had/have each drive partitioned up into two sets of partitions, my "working" set, and my "backup" set. Then I md-raided at my chosen level each partition across all devices. So on each physical device partition 5 might be the working rootfs partition, partition 6 the woriing home partition... partition 9 the backup rootfs partition, and partition 10 the backup home partition. They might end up being md3 (rootwork), md4 (homework), md7 (rootbak) and md8 (homebak). That way, you're protected against physical device death by the redundancy of the raids, and from fat-fingering or an update gone wrong by the redundancy of the backup partitions across the same physical devices. What's nice about an arrangement such as this is that it gives you quite a bit more flexibility than you'd have with a single raid, since it's now possible to decide "Hmm, I don't think I actually need a backup of /var/ log, so I think I'll only run with one log partition/raid, instead of the usual working/backup arrangement." Similarly, "You know, I ultimately don't need backups of the gentoo tree and overlays, or of the kernel git tree, at all, since as Linus says, 'Real men upload it to the net and let others be their backup', and I can always redownload that from the net, so I think I'll raid0 this partition and not keep any copies at all, since re-downloading's less trouble than dealing with the backups anyway." Finally, and possibly critically, it's possible to say, "You know, what happens if I've just wiped rootbak in ordered to make a new root backup, and I have a crash and working-root refuses to boot. I think I need a rootbak2, and with the space I saved by doing only one log partition and by making the sources trees raid0, I have room for it now, without using any more space than I would had I had everything on the same raid." Another nice thing about it, and this is what I would have ended up doing if I hadn't conveniently found that 1 TB USB drive at such a good price, is that while the whole thing is partitioned up and in use, it's very possible to wipe out the backup partitions temporarily, and recreate them as a different raid level or a different filesystem, or otherwise reorganize that area, then reboot into the new version, and do the same to what was the working copies. (For the area that was raid0, well, it was raid0 because it's easy to recreate, so just blow it away and recreate it on the new layout. And for the single-raid log without a backup copy, it's simple enough to simply point the log elsewhere or keep it on rootfs for long enough to redo that set of partitions across all physical devices.) Again, this isn't just theory, it really works, as I've done it to various degrees at various times, even if I found copying to the external 1 TB USB drive and booting from it more convenient to do when I transferred from raid6 to raid1. And being I do run ~arch, there's been a number of times I've needed to boot to rootbak instead of rootworking, including once when a ~arch portage was hosing symlinks just as a glibc update came along, thus breaking glibc (!!), once when a bash update broke, and another time when a glibc update mostly worked but I needed to downgrade and the protection built into the glibc ebuild wasn't letting me do it from my working root. What's nice about this setup in regard to booting to rootbak instead of the usual working root, is that unlike booting to a liveCD/DVD rescue disk, you have the full working system installed, configured and running just as it was when the backup was made. That makes it much easier to pickup and run from where you left off, with all the tools you're used to having and modes of working you're used to using, instead of being limited to some artificial rescue environment often with limited tools, and in any case setup and configured differently than you have your own system, because rootbak IS your own system, just from a few days/weeks/ months ago, whenever it was that you last did the backup. Anyway, with the parameters you specified, two drives full of data and four spare drives (presumably of a similar size), there's a LOT of flexibility. There's raid10 across four drives (two-mirror, two-stripe) with the other two as backup (this would probably be my choice given the 2-disks of data, 6 disk total, constraints, but see below, and it appears this might be your choice as well), or raid6 across four drives (two mirror, two parity) with two as backups (not a choice I'd likely make, but a choice), or a working pair of drives plus two sets of backups (not a choice I'd likely make), or raid10 across all six drives in either 3- mirror/2-stripe or 3-stripe/2-mirror mode (I'd probably elect for this with 3-stripe/2-mirror for the 3X speed and space, and prioritize a separate backup, see the discussion below), or two independent 3-disk raid5s (IMO there's better options for most cases, with the possible exception of primarily slow media usage, just which options are better depends on usage and priorities tho), or some hybrid combination of these. > This sort of rationale is why I don't like arguments like "RAM is cheap" > or "HDs are cheap" or whatever. The fact is that wasting money on any > component means investing less in some other component that could give > you more space/performance/whatever-makes-you-happy. If you have $1000 > that you can afford to blow on extra drives then you have $1000 you > could blow on RAM, CPU, an extra server, or a trip to Disney. Why not > blow it on something useful? [ This gets philosophical. OK to quit here if uninterested. ] You're right. "RAM and HDs are cheap"... relative to WHAT, the big- screen TV/monitor I WOULD have been replacing my much smaller monitor with, if I hadn't been spending the money on the "cheap" RAM and HDs? Of course, "time is cheap" comes with the same caveats, and can actually end up being far more dear. Stress and hassle of administration similarly. And sometimes, just a bit of investment in another "expensive" HD, saves you quite a bit of "cheap" time and stress, that's actually more expensive. "It's all relative"... to one's individual priorities. Because one thing's for sure, both money and time are fungible, and if they aren't spent on one thing, they WILL be on another (even if that "spent" is savings, for money), and ultimately, it's one's individual priorities that should rank where that spending goes. And I can't set your priorities and you can't set mine, so... But from my observation, a LOT of folks don't realize that and/or don't take the time necessary to reevaluate their own priorities from time to time, so end up spending out of line with their real priorities, and end up rather unhappy people as a result! That's one reason why I have a personal policy to deliberately reevaluate personal priorities from time to time (as well as being aware of them constantly), and rearrange spending, money time and otherwise, in accordance with those reranked priorities. I'm absolutely positive I'm a happier man for doing so! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman