Re: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value?

public inbox for gentoo-amd64@lists.gentoo.org
 help / color / mirror / Atom feed

From: Mark Knecht <markknecht@gmail.com>
To: Gentoo AMD64 <gentoo-amd64@lists.gentoo.org>
Subject: Re: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value?
Date: Fri, 21 Jun 2013 10:40:48 -0700	[thread overview]
Message-ID: <CAK2H+ecNGaQ7BfAaWtLkQhg3T-pC56tejDKL6kK+qydLa8YWyg@mail.gmail.com> (raw)
In-Reply-To: <pan$ecf3f$9af69a78$1508667e$d81347b7@cox.net>

On Fri, Jun 21, 2013 at 12:31 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> Mark Knecht posted on Thu, 20 Jun 2013 12:10:04 -0700 as excerpted:
>
>>    Does anyone know of info on how the starting sector number might
>> impact RAID performance under Gentoo? The drives are WD-500G RE3 drives
>> shown here:
>>
>> http://www.amazon.com/Western-Digital-WD5002ABYS-3-5-inch-Enterprise/dp/
> B001EMZPD0/ref=cm_cr_pr_product_top
>>
>>    These are NOT 4k sector sized drives.
>>
>>    Specifically I'm a 5-drive RAID6 for about 1.45TB of storage. My
>> benchmarking seems abysmal at around 40MB/S using dd copying large
>> files.
>> It's higher, around 80MB/S if the file being transferred is coming from
>> an SSD, but even 80MB/S seems slow to me. I see a LOT of wait time in
>> top.
>> And my 'large file' copies might not be large enough as the machine has
>> 24GB of DRAM and I've only been copying 21GB so it's possible some of
>> that is cached.
>
> I /suspect/ that the problem isn't striping, tho that can be a factor,
> but rather, your choice of raid6.  Note that I personally ran md/raid-6
> here for awhile, so I know a bit of what I'm talking about.  I didn't
> realize the full implications of what I was setting up originally, or I'd
> have not chosen raid6 in the first place, but live and learn as they say,
> and that I did.
>
> General rule, raid6 is abysmal for writing and gets dramatically worse as
> fragmentation sets in, tho reading is reasonable.  The reason is that in
> ordered to properly parity-check and write out less-than-full-stripe
> writes, the system must effectively read-in the existing data and merge
> it with the new data, then recalculate the parity, before writing the new
> data AND 100% of the (two-way in raid-6) parity.  Further, because raid
> sits below the filesystem level, it knows nothing about what parts of the
> filesystem are actually used, and must read and write the FULL data
> stripe (perhaps minus the new data bit, I'm not sure), including parts
> that will be empty on a freshly formatted filesystem.
>
> So with 4k block sizes on a 5-device raid6, you'd have 20k stripes, 12k
> in data across three devices, and 8k of parity across the other two
> devices.  Now you go to write a 1k file, but in ordered to do so the full
> 12k of existing data must be read in, even on an empty filesystem,
> because the RAID doesn't know it's empty!  Then the new data must be
> merged in and new checksums created, then the full 20k must be written
> back out, certainly the 8k of parity, but also likely the full 12k of
> data even if most of it is simply rewrite, but almost certainly at least
> the 4k strip on the device the new data is written to.
>
<SNIP>

Hi Duncan,
   Wonderful post but much too long to carry on a conversation
in-line. As you sound pretty sure of your understanding/history I'll
assume you're right 100% of the time, but only maybe 80% of the post
feels right to me at this time so let's assume I have much to learn
and go from there. I expect that others here are in a similar
situation to me - they use RAID but are laboring with little hard data
on what different portions of the system are doing and how to optimize
it. I certainly feel that's true in my case. I hope this thread over
the near or far term future might help a bit for me and potentially
others.

   In thinking about this issue this morning I think it's important to
me to get down to basics and verify as much as possible, step-by-step,
so that I don't layer good work on top of bad assumptions. To that
end, and before I move too much farther forward, let me document a few
things about my system and the hardware available to work with and see
if you, Rich, Bob, Volker or anyone else wants to chime in about what
is correct, not correct or a better way to use it.

Basic Machine - ASUS Rampage II Extreme motherboard (4/1/2010) + 24GB
DDR3 + Core i7-980x Extreme 12 core processor
1 SDD - 120GB SATA3 on it's own controller
5+ HDD - WD5002ABYS RAID Edition 3 SATA3 drives using Intel integrated
controllers

(NOTE: I can possibly go to a 6-drive RAID if I made some changes in
the box but that's for later)

According to the WD spec
(http://www.wdc.com/en/library/spec/2879-701281.pdf) the 500GB drives
sustain 113MB/S to the drive. Using hdparm I measure 107MB/S or higher
for all 5 drives:

c2RAID6 ~ # hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   17374 MB in  2.00 seconds = 8696.12 MB/sec
 Timing buffered disk reads: 322 MB in  3.00 seconds = 107.20 MB/sec
c2RAID6 ~ #

The SDD on it's own PCI Express controller clocks in at about 250MB/S for reads.

c2RAID6 ~ # hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   17492 MB in  2.00 seconds = 8754.42 MB/sec
 Timing buffered disk reads: 760 MB in  3.00 seconds = 253.28 MB/sec
c2RAID6 ~ #

TESTING: I'm using dd to test. It gives an easy to read anyway result
and seems to be used a lot. I can use bonnie++ or IOzone later but I
don't think that's necessary quite yet. Being that I have 24GB and
don't want cached data to effect the test speeds I do the following:

1) Using dd I created a 50GB file for copying using the following commands:

cd /mnt/fastVM
dd if=/dev/random of=random1 bs=1000 count=0 seek=$[1000*1000*50]

mark@c2RAID6 /VirtualMachines/bonnie $ ls -alh /mnt/fastVM/ran*
-rw-r--r-- 1 mark mark 47G Jun 21 07:10 /mnt/fastVM/random1
mark@c2RAID6 /VirtualMachines/bonnie $

2) To ensure that nothing is cached and the copies are (hopefully)
completely fair as root I do the following between each test:

sync
free -h
echo 3 > /proc/sys/vm/drop_caches
free -h

An example:

c2RAID6 ~ # sync
c2RAID6 ~ # free -h
             total       used       free     shared    buffers     cached
Mem:           23G        23G       129M         0B       8.5M        21G
-/+ buffers/cache:       1.6G        21G
Swap:          12G         0B        12G
c2RAID6 ~ # echo 3 > /proc/sys/vm/drop_caches
c2RAID6 ~ # free -h
             total       used       free     shared    buffers     cached
Mem:           23G       2.6G        20G         0B       884K       1.3G
-/+ buffers/cache:       1.3G        22G
Swap:          12G         0B        12G
c2RAID6 ~ #

3) As a first test I copy using dd the 50GB file from the SDD to the
RAID6. As long as reading the SDD is much faster than writing the
RAID6 then it should be a test of primarily the RAID6 write speed:

mark@c2RAID6 /VirtualMachines/bonnie $ dd if=/mnt/fastVM/random1 of=SDDCopy
97656250+0 records in
97656250+0 records out
50000000000 bytes (50 GB) copied, 339.173 s, 147 MB/s
mark@c2RAID6 /VirtualMachines/bonnie $

If I clear cache as above and rerun the test it's always 145-155MB/S

4) As a second test I read from the RAID6 and write back to the RAID6.
I see MUCH lower speeds, again repeatable:

mark@c2RAID6 /VirtualMachines/bonnie $ dd if=SDDCopy of=HDDWrite
97656250+0 records in
97656250+0 records out
50000000000 bytes (50 GB) copied, 1187.07 s, 42.1 MB/s
mark@c2RAID6 /VirtualMachines/bonnie $

5) As a final test, and just looking for problems if any, I do an SDD
to SDD copy which clocked in at close to 200MB/S

mark@c2RAID6 /mnt/fastVM $ dd if=random1 of=SDDCopy
97656250+0 records in
97656250+0 records out
50000000000 bytes (50 GB) copied, 251.105 s, 199 MB/s
mark@c2RAID6 /mnt/fastVM $

   So, being that this RAID6 was grown yesterday from something that
has existed for a year or two I'm not sure of it's fragmentation, or
even how to determine that at this time. However it seems my problem
are RAID6 reads, not RAID6 writes, at least to new an probably never
used disk space.

   I will also report more later but I can state that just using top
there's never much CPU usage doing this but a LOT of WAIT time when
reading the RAID6. It really appears the system is spinning it's
wheels waiting for the RAID to get data from the disk.

   One place where I wanted to double check your thinking. My thought
is that a RAID1 will _NEVER_ outperform the hdparm -tT read speeds as
it has to read from three drives and make sure they are all good
before returning data to the user. I don't see how that could ever be
faster than what a single drive file system could do which for these
drives would be the 113MB/S WD spec number, correct? As I'm currently
getting 145MB/S it appears on the surface that the RAID6 is providing
some value, at least in these early days of use. Maybe it will degrade
over time though.

   Comments?

Cheers,
Mark

next prev parent reply	other threads:[~2013-06-21 17:40 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-20 19:10 [gentoo-amd64] Is my RAID performance bad possibly due to starting sector value? Mark Knecht
2013-06-20 19:16 ` Volker Armin Hemmann
2013-06-20 19:28   ` Mark Knecht
2013-06-20 20:45   ` Mark Knecht
2013-06-24 18:47     ` Volker Armin Hemmann
2013-06-24 19:11       ` Mark Knecht
2013-06-20 19:27 ` Rich Freeman
2013-06-20 19:31   ` Mark Knecht
2013-06-21  7:31 ` [gentoo-amd64] " Duncan
2013-06-21 10:28   ` Rich Freeman
2013-06-21 14:23     ` Bob Sanders
2013-06-21 14:27     ` Duncan
2013-06-21 15:13       ` Rich Freeman
2013-06-22 10:29         ` Duncan
2013-06-22 11:12           ` Rich Freeman
2013-06-22 15:45             ` Duncan
2013-06-22 23:04     ` Mark Knecht
2013-06-22 23:17       ` Matthew Marlowe
2013-06-23 11:43       ` Rich Freeman
2013-06-23 15:23         ` Mark Knecht
2013-06-28  0:51       ` Duncan
2013-06-28  3:18         ` Matthew Marlowe
2013-06-21 17:40   ` Mark Knecht [this message]
2013-06-21 17:56     ` Bob Sanders
2013-06-21 18:12       ` Mark Knecht
2013-06-21 17:57     ` Rich Freeman
2013-06-21 18:10       ` Gary E. Miller
2013-06-21 18:38       ` Mark Knecht
2013-06-21 18:50         ` Gary E. Miller
2013-06-21 18:57           ` Rich Freeman
2013-06-22 14:34           ` Duncan
2013-06-22 22:15             ` Gary E. Miller
2013-06-28  0:20               ` Duncan
2013-06-28  0:41                 ` Gary E. Miller
2013-06-21 18:53         ` Bob Sanders
2013-06-22 14:23     ` Duncan
2013-06-23  1:02       ` Mark Knecht
2013-06-23  1:48         ` Mark Knecht
2013-06-28  3:36           ` Duncan
2013-06-28  9:12             ` Duncan
2013-06-28 17:50               ` Gary E. Miller
2013-06-29  5:40                 ` Duncan
2013-06-30  1:04   ` Rich Freeman
2013-06-22 12:49 ` [gentoo-amd64] " B Vance
2013-06-22 13:12   ` Rich Freeman
2013-06-23 11:31 ` thegeezer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAK2H+ecNGaQ7BfAaWtLkQhg3T-pC56tejDKL6kK+qydLa8YWyg@mail.gmail.com \
    --to=markknecht@gmail.com \
    --cc=gentoo-amd64@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox