From: Duncan <1i5t5.duncan@cox.net>
To: gentoo-amd64@lists.gentoo.org
Subject: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value?
Date: Sat, 22 Jun 2013 10:29:59 +0000 (UTC) [thread overview]
Message-ID: <pan$2d013$83032870$fb7caf22$c0cf28d0@cox.net> (raw)
In-Reply-To: CAGfcS_nCzMWWJdLc_3jPOWZdwgmV_YTLqks2+WT6yqhD1hTW5g@mail.gmail.com
Rich Freeman posted on Fri, 21 Jun 2013 11:13:51 -0400 as excerpted:
> On Fri, Jun 21, 2013 at 10:27 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>> Question: Would you use [btrfs] for raid1 yet, as I'm doing?
>> What about as a single-device filesystem?
> If I wanted to use raid1 I might consider using btrfs now. I think it
> is still a bit risky, but the established use cases have gotten a fair
> bit of testing now. I'd be more confident in using it with a single
> device.
OK, so we agree on the basic confidence level of various btrfs features.
I trust my own judgement a bit more now. =:^)
> To migrate today would require finding someplace to dump all
> the data offline and migrate the drives, as there is no in-place way to
> migrate multiple ext3/4 logical volumes on top of mdadm to a single
> btrfs on bare metal.
... Unless you have enough unpartitioned space available still.
What I did a few years ago is buy a 1 TB USB drive I found at a good
deal. (It was very near the price of half-TB drives at the time, I
figured out later they must have gotten shipped a pallet of the wrong
ones for a sale on the half-TB version of the same thing, so it was a
single-store, get-it-while-they're-there-to-get, deal.)
That's how I was able to migrate from the raid6 I had back to raid1. I
had to squeeze the data/partitions a bit to get everything to fit, but it
did, and that's how I ended up with 4-way raid1, since it /had/ been a 4-
way raid6. All 300-gig drives at the time, so the TB USB had /plenty/ of
room. =:^)
> Without replying to anything in particular both you and Bob have
> mentioned the importance of multiple redundancy.
>
> Obviously risk goes down as redundancy goes up. If you protect 25
> drives of data with 1 drive of parity then you need 2/26 drives to fail
> to hose 25 drives of data.
Ouch!
> If you protect 1 drive of data with 25 drives of parity (call them
> mirrors or parity or whatever - they're functionally equivalent) then
> you need 25/26 drives to fail to lose 1 drive of data.
Almost correct.
Except that with 25/26 failed, you'd still have 1 working, which with
raid1/mirroring would be enough. (AFAIK that's the difference with
parity. Parity is generally done on a minimum of two devices with the
third as parity, and going down to just one isn't enough, you can lose
only one, or two if you have two-way-parity as with raid6. With
mirroring/raid1, they're all essentially identical, so one is enough to
keep going, you'd have to loose 26/26 to be dead in the water. But 25/26
dead or 26/26 dead, you better HOPE it never comes down to where that
matters!)
> RAID 1 is actually less effective - if you protect 13
> drives of data with 13 mirrors you need 2/26 drives to fail to lose 1
> drive of data (they just have to be the wrong 2). However, you do need
> to consider that RAID is not the only way to protect data, and I'm not
> sure that multiple-redundancy raid-1 is the most cost-effective
> strategy.
The first time I read that thru I read it wrong, and was about to
disagree. Then I realized what you meant... and that it was an equally
valid read of what you wrote, except...
AFAIK 13 drives of data with 13 mirrors wouldn't (normally) be called
raid1 (unless it's 13 individual raid1s). Normally, an arrangement of
that nature if configured together would be configured as raid10, 2-way-
mirrored, 13-way-striped (or possibly raid0+1, but that's not recommended
for technical reasons having to do with rebuild thruput), tho it could
also be configured as what mdraid calls linear mode (which isn't really
raid, but it happens to be handled by the same md/raid driver in Linux)
across the 13, plus raid1, or if they're configured as separate volumes,
13 individual two-disk raid1s, any of which might be what you meant (and
the wording appears to favor 13 individual raid1s).
What I interpreted it as initially was a 13-way raid1, mirrored again at
a second level to 13 additional drives, which would be called raid11,
except that there's no benefit of that over a simple single-layer 26-way
raid1 so the raid11 term is seldom seen, and that's clearly not what you
meant.
Anyway, you're correct if it's just two-way-mirrored. However, at that
level, if one was to do only two-way-mirroring, one would usually do
either raid10 for the 13-way striping, or 13 separate raid1s, which would
give one the opportunity to make some of them 3-way-mirrored (or more)
raid1s for the really vital data, leaving the less vital data as simple
2-way-mirror-raid1s.
Or raid6 and get loss-of-two tolerance, but as this whole subthread is
discussing, that can be problematic for thruput. (I've occasionally seen
reference to raid7, which is said to be 3-way-parity, loss-of-three-
tolerance, but AFAIK there's no support for it in the kernel, and I
wouldn't be surprised if all implementations are proprietary. AFAIK, in
practice, raid10 with N-way mirroring on the raid1 portion is implemented
once that many devices get involved, or other multi-level raid schemes.)
> If I had 2 drives of data to protect and had 4 spare drives to do it
> with, I doubt I'd set up a 3x raid-1/5/10 setup (or whatever you want to
> call it - imho raid "levels" are poorly named as there really is just
> striping and mirroring and adding RS parity and everything else is just
> combinations). Instead I'd probably set up a RAID1/5/10/whatever with
> single redundancy for faster storage and recovery, and an offline backup
> (compressed and with incrementals/etc). The backup gets you more
> security and you only need it in a very unlikely double-failure. I'd
> only invest in multiple redundancy in the event that the risk-weighted
> cost of having the node go down exceeds the cost of the extra drives.
> Frankly in that case RAID still isn't the right solution - you need a
> backup node someplace else entirely as hard drives aren't the only thing
> that can break in your server.
So we're talking six drives, two of data and four "spares" to play with.
Often that's setup as raid10, either two-way-striped and 3-way-mirrored,
or 3-way-striped and 2-way-mirrored, depending on whether the loss-of-two
tolerance of 3-way-mirroring or thruput of three-way-striping, is
considered of higher value.
You're right that at that level, you DO need a real backup, and it should
take priority over raid-whatever. HOWEVER, in addition to creating a
SINGLE raid across all those drives, it's possible to partition them up,
and create multiple raids out of the partitions, with one set being a
backup of the other. And since you've already stated that there's only
two drives worth of data, there's certainly room enough amongst the six
drives total to do just that.
This is in fact how I ran my raids, both my raid6 config, and my raid1
config, for a number of years, and is in fact how I have my (raid1-mode)
btrfs filesystems setup now on the SSDs.
Effectively I had/have each drive partitioned up into two sets of
partitions, my "working" set, and my "backup" set. Then I md-raided at
my chosen level each partition across all devices. So on each physical
device partition 5 might be the working rootfs partition, partition 6 the
woriing home partition... partition 9 the backup rootfs partition, and
partition 10 the backup home partition. They might end up being md3
(rootwork), md4 (homework), md7 (rootbak) and md8 (homebak).
That way, you're protected against physical device death by the
redundancy of the raids, and from fat-fingering or an update gone wrong
by the redundancy of the backup partitions across the same physical
devices.
What's nice about an arrangement such as this is that it gives you quite
a bit more flexibility than you'd have with a single raid, since it's now
possible to decide "Hmm, I don't think I actually need a backup of /var/
log, so I think I'll only run with one log partition/raid, instead of the
usual working/backup arrangement." Similarly, "You know, I ultimately
don't need backups of the gentoo tree and overlays, or of the kernel git
tree, at all, since as Linus says, 'Real men upload it to the net and let
others be their backup', and I can always redownload that from the net,
so I think I'll raid0 this partition and not keep any copies at all,
since re-downloading's less trouble than dealing with the backups
anyway." Finally, and possibly critically, it's possible to say, "You
know, what happens if I've just wiped rootbak in ordered to make a new
root backup, and I have a crash and working-root refuses to boot. I
think I need a rootbak2, and with the space I saved by doing only one log
partition and by making the sources trees raid0, I have room for it now,
without using any more space than I would had I had everything on the
same raid."
Another nice thing about it, and this is what I would have ended up doing
if I hadn't conveniently found that 1 TB USB drive at such a good price,
is that while the whole thing is partitioned up and in use, it's very
possible to wipe out the backup partitions temporarily, and recreate them
as a different raid level or a different filesystem, or otherwise
reorganize that area, then reboot into the new version, and do the same
to what was the working copies. (For the area that was raid0, well, it
was raid0 because it's easy to recreate, so just blow it away and
recreate it on the new layout. And for the single-raid log without a
backup copy, it's simple enough to simply point the log elsewhere or keep
it on rootfs for long enough to redo that set of partitions across all
physical devices.)
Again, this isn't just theory, it really works, as I've done it to
various degrees at various times, even if I found copying to the external
1 TB USB drive and booting from it more convenient to do when I
transferred from raid6 to raid1.
And being I do run ~arch, there's been a number of times I've needed to
boot to rootbak instead of rootworking, including once when a ~arch
portage was hosing symlinks just as a glibc update came along, thus
breaking glibc (!!), once when a bash update broke, and another time when
a glibc update mostly worked but I needed to downgrade and the protection
built into the glibc ebuild wasn't letting me do it from my working root.
What's nice about this setup in regard to booting to rootbak instead of
the usual working root, is that unlike booting to a liveCD/DVD rescue
disk, you have the full working system installed, configured and running
just as it was when the backup was made. That makes it much easier to
pickup and run from where you left off, with all the tools you're used to
having and modes of working you're used to using, instead of being
limited to some artificial rescue environment often with limited tools,
and in any case setup and configured differently than you have your own
system, because rootbak IS your own system, just from a few days/weeks/
months ago, whenever it was that you last did the backup.
Anyway, with the parameters you specified, two drives full of data and
four spare drives (presumably of a similar size), there's a LOT of
flexibility. There's raid10 across four drives (two-mirror, two-stripe)
with the other two as backup (this would probably be my choice given the
2-disks of data, 6 disk total, constraints, but see below, and it appears
this might be your choice as well), or raid6 across four drives (two
mirror, two parity) with two as backups (not a choice I'd likely make,
but a choice), or a working pair of drives plus two sets of backups (not
a choice I'd likely make), or raid10 across all six drives in either 3-
mirror/2-stripe or 3-stripe/2-mirror mode (I'd probably elect for this
with 3-stripe/2-mirror for the 3X speed and space, and prioritize a
separate backup, see the discussion below), or two independent 3-disk
raid5s (IMO there's better options for most cases, with the possible
exception of primarily slow media usage, just which options are better
depends on usage and priorities tho), or some hybrid combination of these.
> This sort of rationale is why I don't like arguments like "RAM is cheap"
> or "HDs are cheap" or whatever. The fact is that wasting money on any
> component means investing less in some other component that could give
> you more space/performance/whatever-makes-you-happy. If you have $1000
> that you can afford to blow on extra drives then you have $1000 you
> could blow on RAM, CPU, an extra server, or a trip to Disney. Why not
> blow it on something useful?
[ This gets philosophical. OK to quit here if uninterested. ]
You're right. "RAM and HDs are cheap"... relative to WHAT, the big-
screen TV/monitor I WOULD have been replacing my much smaller monitor
with, if I hadn't been spending the money on the "cheap" RAM and HDs?
Of course, "time is cheap" comes with the same caveats, and can actually
end up being far more dear. Stress and hassle of administration
similarly. And sometimes, just a bit of investment in another
"expensive" HD, saves you quite a bit of "cheap" time and stress, that's
actually more expensive.
"It's all relative"... to one's individual priorities. Because one
thing's for sure, both money and time are fungible, and if they aren't
spent on one thing, they WILL be on another (even if that "spent" is
savings, for money), and ultimately, it's one's individual priorities
that should rank where that spending goes. And I can't set your
priorities and you can't set mine, so... But from my observation, a LOT
of folks don't realize that and/or don't take the time necessary to
reevaluate their own priorities from time to time, so end up spending out
of line with their real priorities, and end up rather unhappy people as a
result! That's one reason why I have a personal policy to deliberately
reevaluate personal priorities from time to time (as well as being aware
of them constantly), and rearrange spending, money time and otherwise, in
accordance with those reranked priorities. I'm absolutely positive I'm a
happier man for doing so! =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2013-06-22 10:30 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-20 19:10 [gentoo-amd64] Is my RAID performance bad possibly due to starting sector value? Mark Knecht
2013-06-20 19:16 ` Volker Armin Hemmann
2013-06-20 19:28 ` Mark Knecht
2013-06-20 20:45 ` Mark Knecht
2013-06-24 18:47 ` Volker Armin Hemmann
2013-06-24 19:11 ` Mark Knecht
2013-06-20 19:27 ` Rich Freeman
2013-06-20 19:31 ` Mark Knecht
2013-06-21 7:31 ` [gentoo-amd64] " Duncan
2013-06-21 10:28 ` Rich Freeman
2013-06-21 14:23 ` Bob Sanders
2013-06-21 14:27 ` Duncan
2013-06-21 15:13 ` Rich Freeman
2013-06-22 10:29 ` Duncan [this message]
2013-06-22 11:12 ` Rich Freeman
2013-06-22 15:45 ` Duncan
2013-06-22 23:04 ` Mark Knecht
2013-06-22 23:17 ` Matthew Marlowe
2013-06-23 11:43 ` Rich Freeman
2013-06-23 15:23 ` Mark Knecht
2013-06-28 0:51 ` Duncan
2013-06-28 3:18 ` Matthew Marlowe
2013-06-21 17:40 ` Mark Knecht
2013-06-21 17:56 ` Bob Sanders
2013-06-21 18:12 ` Mark Knecht
2013-06-21 17:57 ` Rich Freeman
2013-06-21 18:10 ` Gary E. Miller
2013-06-21 18:38 ` Mark Knecht
2013-06-21 18:50 ` Gary E. Miller
2013-06-21 18:57 ` Rich Freeman
2013-06-22 14:34 ` Duncan
2013-06-22 22:15 ` Gary E. Miller
2013-06-28 0:20 ` Duncan
2013-06-28 0:41 ` Gary E. Miller
2013-06-21 18:53 ` Bob Sanders
2013-06-22 14:23 ` Duncan
2013-06-23 1:02 ` Mark Knecht
2013-06-23 1:48 ` Mark Knecht
2013-06-28 3:36 ` Duncan
2013-06-28 9:12 ` Duncan
2013-06-28 17:50 ` Gary E. Miller
2013-06-29 5:40 ` Duncan
2013-06-30 1:04 ` Rich Freeman
2013-06-22 12:49 ` [gentoo-amd64] " B Vance
2013-06-22 13:12 ` Rich Freeman
2013-06-23 11:31 ` thegeezer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$2d013$83032870$fb7caf22$c0cf28d0@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=gentoo-amd64@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox