I would recommend that anyone concerned about mdadm software raid performance on gentoo test via tools like bonnie++ before putting any data on the drives and separate from data into different sets/volumes.
I did testing two years ago watching read, write burst and sustained rates, file ops per second, etc.... Ended up getting 7 2tb enterprise data drives
Disk 1 is os, no raid
Disk 2-5 are data, raid 10
Disk 6-7 are backups and to test/scratch space, raid 0
On Fri, Jun 21, 2013 at 3:28 AM, Rich Freeman <rich0@gentoo.org> wrote:
> On Fri, Jun 21, 2013 at 3:31 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>> So with 4k block sizes on a 5-device raid6, you'd have 20k stripes, 12k
>> in data across three devices, and 8k of parity across the other two
>> devices.
>
> With mdadm on a 5-device raid6 with 512K chunks you have 1.5M in a
> stripe, not 20k. If you modify one block it needs to read all 1.5M,
> or it needs to read at least the old chunk on the single drive to be
> modified and both old parity chunks (which on such a small array is 3
> disks either way).
>
Hi Rich,
I've been rereading everyone's posts as well as trying to collect
my own thoughts. One question I have at this point, being that you and
I seem to be the two non-RAID1 users (but not necessarily devotees) at
this time, is what chunk size, stride & stripe width with you are
using? Are you currently using 512K chunks on your RAID5? If so that's
potentially quite different than my 16K chunk RAID6. The more I read
through this thread and other things on the web the more I am
concerned that 16K chunks has possibly forced far more IO operations
that really makes sense for performance. Unfortunately there's no easy
way to me to really test this right now as the RAID6 uses the whole
drive. However for every 512K I want to get off the drive you might
need 1 chuck whereas I'm going to need what, 32 chunks? That's got to
be a lot more IO operations on my machine isn't it?
For clarity, I'm a 16K chunk, stride of 4K, stripe of 12K:
c2RAID6 ~ # tune2fs -l /dev/md3 | grep RAID
Filesystem volume name: RAID6root
RAID stride: 4
RAID stripe width: 12
c2RAID6 ~ #
c2RAID6 ~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid6 sdb3[9] sdf3[5] sde3[6] sdd3[7] sdc3[8]
1452264480 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU]
unused devices: <none>
c2RAID6 ~ #
As I understand one of your earlier responses I think you are using
4K sector drives, which again has that extra level of complexity in
terms of creating the partitions initially, but after that should be
fairly straight forward to use. (I think) That said there are
trade-offs between RAID5 & RAID6 but have you measured speeds using
anything like the dd method I posted yesterday, or any other way that
we could compare?
As I think Duncan asked about storage usage requirements in another
part of this thread I'll just document it here. The machine serves
main 3 purposes for me:
1) It's my day in, day out desktop. I run almostly totally Gentoo
64-bit stable unless I need to keyword a package to get what I need.
Over time I tend to let my keyworded packages go stable if they are
working for me. The overall storage requirements for this, including
my home directory, typically don't run over 50GB.
2) The machine runs 3 Windows VMs every day - 2 Win 7 & 1 Win XP.
Total storage for the basic VMs is about 150GB. XP is just for things
like NetFlix. These 3 VMs typically have allocated 9 cores allocated
to them (6+2+1) leaving 3 for Gentoo to run the hardware, etc. The 6
core VM is often using 80-100% of its CPUs sustained for times. (hours
to days.) It's doing a lot of stock market math...
3) More recently, and really the reason to consolidate into a single
RAID of any type, I have about 900GB of mp4s which has been on an
external USB drive, and backed up to a second USB drive. However this
is mostly storage. We watch most of this video on the TV using the
second copy drive hooked directly to the TV or copied onto Kindles.
I've been having to keep multiple backups of this outside the machine
(poor man's RAID1 - two separate USB drives hooked up one at a time!)
;-) I'd rather just keep it safe on the RAID 6, That said, I've not
yet put it on the RAID6 as I have these performance issues I'd like to
solve first. (If possible. Duncan is making me worry that they cannot
be solved...)
Lastly, even if I completely buy into Duncan's well formed reasons
about why RAID1 might be faster, using 500GB drives I see no single
RAID solution for me other than RAID5/6. The real RAID1/RAID6
comparison from a storage standpoint would be a (conceptual) 3-drive
RAID6 vs 3 drive RAID1. Both create 500GB of storage and can
(conceptually) lose 2 drives and still recover data. However adding
another drive to the RAID1 gains you more speed but no storage (buying
into Duncan's points) vs adding storage to the RAID6 and probably
reducing speed. As I need storage what other choices do I have?
Answering myself, take the 5 drives, create two RAIDS - a 500GB
2-drive RAID1 for the system + VMs, and then a 3-drive RAID5 for video
data maybe? I don't know...
Or buy more hardware and do a 2 drive SSD RAID1 for the system, or
a hardware RAID controller, etc. The options explode if I start buying
more hardware.
Also, THANKS TO EVERYONE for the continued conversation.
Cheers,
Mark