From: Mark Knecht <markknecht@gmail.com>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far
Date: Sun, 7 Feb 2010 12:31:24 -0800 [thread overview]
Message-ID: <5bdc1c8b1002071231pd809728y69e4f5e7eede9918@mail.gmail.com> (raw)
In-Reply-To: <20100207193947.GB30196@math.princeton.edu>
On Sun, Feb 7, 2010 at 11:39 AM, Willie Wong <wwong@math.princeton.edu> wrote:
> On Sun, Feb 07, 2010 at 08:27:46AM -0800, Mark Knecht wrote:
>> <QUOTE>
>> 4KB physical sectors: KNOW WHAT YOU'RE DOING!
>>
>> Pros: Quiet, cool-running, big cache
>>
>> Cons: The 4KB physical sectors are a problem waiting to happen. If you
>> misalign your partitions, disk performance can suffer. I ran
>> benchmarks in Linux using a number of filesystems, and I found that
>> with most filesystems, read performance and write performance with
>> large files didn't suffer with misaligned partitions, but writes of
>> many small files (unpacking a Linux kernel archive) could take several
>> times as long with misaligned partitions as with aligned partitions.
>> WD's advice about who needs to be concerned is overly simplistic,
>> IMHO, and it's flat-out wrong for Linux, although it's probably
>> accurate for 90% of buyers (those who run Windows or Mac OS and use
>> their standard partitioning tools). If you're not part of that 90%,
>> though, and if you don't fully understand this new technology and how
>> to handle it, buy a drive with conventional 512-byte sectors!
>> </QUOTE>
>>
>> Now, I don't mind getting a bit dirty learning to use this
>> correctly but I'm wondering what that means in a practical sense.
>> Reading the mke2fs man page the word 'sector' doesn't come up. It's my
>> understanding the Linux 'blocks' are groups of sectors. True? If the
>> disk must use 4K sectors then what - the smallest block has to be 4K
>> and I'm using 1 sector per block? It seems that ext3 doesn't support
>> anything larger than 4K?
>
> The problem is not when you are making the filesystem with mke2fs, but
> when you partitioned the disk using fdisk. I'm sure I am making some
> small mistakes in the explanation below, but it goes something like
> this:
>
> a) The harddrive with 4K sectors allows the head to efficiently
> read/write 4K sized blocks at a time.
> b) However, to be compatible in hardware, the harddrive allows 512B
> sized blocks to be addressed. In reality, this means that you can
> individually address the 8 512B-sized chunks of the 4K sized blocks,
> but each will count as a separate operation. To illustrate: say the
> hardware has some sector X of size 4K. It has 8 addressable slots
> inside X1 ... X8 each of size 512B. If your OS clusters read/writes on
> the 512B level, it will send 8 commands to read the info in those 8
> blocks separately. If your OS clusters in 4K, it will send one
> command. So in the stupid analysis I give here, it will take 8 times
> as long for the 512B addressing to read the same data, since it will
> take 8 passes, and each time inefficiently reading only 1/8 of the
> data required. Now in reality, drives are smarter than that: if all 8
> of those are sent in sequence, sometimes the drives will cluster them
> together in one read.
> c) A problem occurs, however, when your OS deals with 4K clusters but
> when you make the partition, the partition is offset! Imagine the
> physical read sectors of your disk looking like
>
> AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD
>
> but when you make your partitions, somehow you partitioned it
>
> ....YYYYYYYYZZZZZZZZWWWWWWWW....
>
> This is possible because the drive allows addressing by 512K chunks.
> So for some reason one of your partitions starts halfway inside a
> physical sector. What is the problem with this? Now suppose your OS
> sends data to be written to the ZZZZZZZZ block. If it were completely
> aligned, the drive will just go kink-move the head to the block, and
> overwrite it with this information. But since half of the block is
> over the BBBB phsical sector, and half over CCCC, what the disk now
> needs to do is to
>
> pass 1) read BBBBBBBB
> pass 2) modify the second half of BBBB to match the first half of ZZZZ
> pass 3) write BBBBBBBB
> pass 4) read CCCCCCCC
> pass 5) modify the first half of CCCC to match the second half of ZZZZ
> pass 6) write CCCCCCCC
>
> Or what is known as a read-modify-write operation. Thus the disk
> becomes a lot less efficient.
>
> ----------
>
> Now, I don't know if this is the actual problem is causing your
> performance problems. But this may be it. When you use fdisk, it
> defaults to aligning the partition to cylinder boundaries, and use the
> default (from ancient times) value of 63 x (512B sized) sectors per
> track. Since 63 is not evenly divisible by 8, you see that quite
> likely some of your partitions are not aligned to the physical sector
> boundaries.
>
> If you use cfdisk, you can try to change the geometry with the command
> g. Or you can use the command u to change the units used in the
> partitioning to either sectors or megabytes, and make sure your
> partition sizes are a multiple of 8 in the former, or an integer in
> the latter.
>
> Again, take what I wrote with a grain of salt: this information came
> from the research I did a little while back after reading the slashdot
> article on this 4K switch. So being my own understanding, it may not
> completely be correct.
>
> HTH,
>
> W
> --
> Willie W. Wong wwong@math.princeton.edu
> Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
> et vice versa ~~~ I. Newton
>
>
Willie,
Thanks. Your description above is pretty much consistent (I think)
with the information I found at the WD site explaining how the data is
being physically packed on the drive. Being that I have the OS set up
on a different drive I was able to blow away all the partitions so I
just created 1 large 1T partition but I think that doesn't deal with
the exact problem you outline.
I'll have to study how to change the geometry. I do see that cfdisk
is reporting 255/63/121601. Am I to choose a size that __smaller__
than 63 but a multiple of 8? I.e. - 56? And then if I do that does the
partitioning of the drive just ignore those last 7 sectors and reduce
capacity by 56/63 or about 11%?
Or is it legal to push the number of sectors up to 64? I would have
thought that the sector count would be driven by really low level
formatting and I shouldn't be messing with that.
Assuming I have done what you are suggesting then with 7
blocks/track then I need to choose the starting positions of each
partition to be aligned to the start of a new 8 sector blocks?
It's very strange that the disk industry chose anything that's not
2^X but I guess they did.
As per your and Volker's suggestions I'm going to study the proper
way to align partitions before I do anything more. I did find a small
program called 'fio' that does some interesting drive testing
including seek time testing. I need to study how to really use it
though. It can set up multiple threads to simulate loads that are more
real-world like.
Thanks to you both for the responses.
Cheers,
Mark
next prev parent reply other threads:[~2010-02-07 21:03 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-07 16:27 [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far Mark Knecht
2010-02-07 17:30 ` Alexander
2010-02-07 18:19 ` Volker Armin Hemmann
2010-02-07 19:26 ` Mark Knecht
2010-02-07 18:38 ` Mark Knecht
2010-02-07 19:16 ` Volker Armin Hemmann
2010-02-07 19:39 ` Willie Wong
2010-02-07 20:31 ` Mark Knecht [this message]
2010-02-07 21:59 ` Kyle Bader
2010-02-07 21:42 ` Mark Knecht
2010-02-08 2:08 ` Willie Wong
2010-02-08 17:10 ` Mark Knecht
2010-02-08 18:52 ` Valmor de Almeida
2010-02-08 20:34 ` Paul Hartman
2010-02-09 0:27 ` Neil Bothwick
2010-02-09 12:46 ` Stroller
2010-02-09 13:34 ` Neil Bothwick
2010-02-09 23:37 ` Iain Buchanan
2010-02-10 6:31 ` Volker Armin Hemmann
2010-02-10 7:11 ` Iain Buchanan
2010-02-10 8:37 ` Volker Armin Hemmann
2010-02-10 8:43 ` Volker Armin Hemmann
2010-02-09 13:35 ` Volker Armin Hemmann
2010-02-09 13:57 ` J. Roeleveld
2010-02-09 15:11 ` Stroller
2010-02-09 15:27 ` J. Roeleveld
2010-02-09 17:38 ` Stroller
2010-02-09 18:25 ` Mark Knecht
2010-02-09 19:29 ` J. Roeleveld
2010-02-09 15:43 ` Neil Bothwick
2010-02-09 17:17 ` Stroller
2010-02-09 20:30 ` Neil Bothwick
2010-02-09 18:03 ` Neil Walker
2010-02-09 19:37 ` J. Roeleveld
2010-02-09 23:52 ` Iain Buchanan
2010-02-10 1:16 ` Stroller
2010-02-10 6:59 ` Neil Walker
2010-02-10 7:31 ` Iain Buchanan
2010-02-10 1:28 ` Stroller
2010-02-10 11:14 ` J. Roeleveld
2010-02-10 16:37 ` Stroller
2010-02-10 17:26 ` J. Roeleveld
2010-02-10 20:48 ` Stroller
2010-02-10 0:11 ` Peter Humphrey
2010-02-10 6:48 ` Neil Walker
2010-02-09 17:33 ` Paul Hartman
2010-02-09 7:47 ` J. Roeleveld
2010-02-09 23:22 ` Iain Buchanan
2010-02-10 7:08 ` Alan McKinnon
2010-02-10 10:56 ` J. Roeleveld
2010-02-10 10:53 ` J. Roeleveld
2010-02-10 11:03 ` Volker Armin Hemmann
2010-02-10 11:17 ` J. Roeleveld
2010-02-10 11:24 ` Volker Armin Hemmann
2010-02-08 5:25 ` Valmor de Almeida
2010-02-08 19:57 ` Stroller
2010-02-09 0:05 ` Frank Steinmetzger
2010-02-09 0:37 ` Mark Knecht
2010-02-09 2:48 ` Frank Steinmetzger
2010-02-09 17:09 ` Frank Steinmetzger
2010-02-09 18:21 ` Mark Knecht
2010-02-09 21:13 ` Frank Steinmetzger
2010-02-09 22:17 ` J. Roeleveld
2010-02-09 22:54 ` Mark Knecht
2010-02-10 0:31 ` Iain Buchanan
2010-02-10 1:27 ` Mark Knecht
2010-02-10 7:06 ` Iain Buchanan
2010-02-09 16:31 ` Mark Knecht
2010-02-12 9:06 ` Mick
2010-02-12 12:14 ` Mark Knecht
2010-02-09 0:47 ` Stroller
2010-02-09 2:20 ` Willie Wong
2010-02-15 0:48 ` Frank Steinmetzger
2010-02-15 1:17 ` Willie Wong
2010-02-15 3:17 ` Mark Knecht
2010-02-15 18:03 ` Frank Steinmetzger
2010-02-15 23:53 ` Alex Schuster
2010-02-16 17:35 ` Frank Steinmetzger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5bdc1c8b1002071231pd809728y69e4f5e7eede9918@mail.gmail.com \
--to=markknecht@gmail.com \
--cc=gentoo-user@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox