public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: Mark Knecht <markknecht@gmail.com>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar  performance so far
Date: Sun, 7 Feb 2010 12:31:24 -0800	[thread overview]
Message-ID: <5bdc1c8b1002071231pd809728y69e4f5e7eede9918@mail.gmail.com> (raw)
In-Reply-To: <20100207193947.GB30196@math.princeton.edu>

On Sun, Feb 7, 2010 at 11:39 AM, Willie Wong <wwong@math.princeton.edu> wrote:
> On Sun, Feb 07, 2010 at 08:27:46AM -0800, Mark Knecht wrote:
>> <QUOTE>
>> 4KB physical sectors: KNOW WHAT YOU'RE DOING!
>>
>> Pros: Quiet, cool-running, big cache
>>
>> Cons: The 4KB physical sectors are a problem waiting to happen. If you
>> misalign your partitions, disk performance can suffer. I ran
>> benchmarks in Linux using a number of filesystems, and I found that
>> with most filesystems, read performance and write performance with
>> large files didn't suffer with misaligned partitions, but writes of
>> many small files (unpacking a Linux kernel archive) could take several
>> times as long with misaligned partitions as with aligned partitions.
>> WD's advice about who needs to be concerned is overly simplistic,
>> IMHO, and it's flat-out wrong for Linux, although it's probably
>> accurate for 90% of buyers (those who run Windows or Mac OS and use
>> their standard partitioning tools). If you're not part of that 90%,
>> though, and if you don't fully understand this new technology and how
>> to handle it, buy a drive with conventional 512-byte sectors!
>> </QUOTE>
>>
>>    Now, I don't mind getting a bit dirty learning to use this
>> correctly but I'm wondering what that means in a practical sense.
>> Reading the mke2fs man page the word 'sector' doesn't come up. It's my
>> understanding the Linux 'blocks' are groups of sectors. True? If the
>> disk must use 4K sectors then what - the smallest block has to be 4K
>> and I'm using 1 sector per block? It seems that ext3 doesn't support
>> anything larger than 4K?
>
> The problem is not when you are making the filesystem with mke2fs, but
> when you partitioned the disk using fdisk. I'm sure I am making some
> small mistakes in the explanation below, but it goes something like
> this:
>
> a) The harddrive with 4K sectors allows the head to efficiently
> read/write 4K sized blocks at a time.
> b) However, to be compatible in hardware, the harddrive allows 512B
> sized blocks to be addressed. In reality, this means that you can
> individually address the 8 512B-sized chunks of the 4K sized blocks,
> but each will count as a separate operation. To illustrate: say the
> hardware has some sector X of size 4K. It has 8 addressable slots
> inside X1 ... X8 each of size 512B. If your OS clusters read/writes on
> the 512B level, it will send 8 commands to read the info in those 8
> blocks separately. If your OS clusters in 4K, it will send one
> command. So in the stupid analysis I give here, it will take 8 times
> as long for the 512B addressing to read the same data, since it will
> take 8 passes, and each time inefficiently reading only 1/8 of the
> data required. Now in reality, drives are smarter than that: if all 8
> of those are sent in sequence, sometimes the drives will cluster them
> together in one read.
> c) A problem occurs, however, when your OS deals with 4K clusters but
> when you make the partition, the partition is offset! Imagine the
> physical read sectors of your disk looking like
>
> AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD
>
> but when you make your partitions, somehow you partitioned it
>
> ....YYYYYYYYZZZZZZZZWWWWWWWW....
>
> This is possible because the drive allows addressing by 512K chunks.
> So for some reason one of your partitions starts halfway inside a
> physical sector. What is the problem with this? Now suppose your OS
> sends data to be written to the ZZZZZZZZ block. If it were completely
> aligned, the drive will just go kink-move the head to the block, and
> overwrite it with this information. But since half of the block is
> over the BBBB phsical sector, and half over CCCC, what the disk now
> needs to do is to
>
> pass 1) read BBBBBBBB
> pass 2) modify the second half of BBBB to match the first half of ZZZZ
> pass 3) write BBBBBBBB
> pass 4) read CCCCCCCC
> pass 5) modify the first half of CCCC to match the second half of ZZZZ
> pass 6) write CCCCCCCC
>
> Or what is known as a read-modify-write operation. Thus the disk
> becomes a lot less efficient.
>
> ----------
>
> Now, I don't know if this is the actual problem is causing your
> performance problems. But this may be it. When you use fdisk, it
> defaults to aligning the partition to cylinder boundaries, and use the
> default (from ancient times) value of 63 x (512B sized) sectors per
> track. Since 63 is not evenly divisible by 8, you see that quite
> likely some of your partitions are not aligned to the physical sector
> boundaries.
>
> If you use cfdisk, you can try to change the geometry with the command
> g. Or you can use the command u to change the units used in the
> partitioning to either sectors or megabytes, and make sure your
> partition sizes are a multiple of 8 in the former, or an integer in
> the latter.
>
> Again, take what I wrote with a grain of salt: this information came
> from the research I did a little while back after reading the slashdot
> article on this 4K switch. So being my own understanding, it may not
> completely be correct.
>
> HTH,
>
> W
> --
> Willie W. Wong                                     wwong@math.princeton.edu
> Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
>         et vice versa   ~~~  I. Newton
>
>

Willie,
   Thanks. Your description above is pretty much consistent (I think)
with the information I found at the WD site explaining how the data is
being physically packed on the drive. Being that I have the OS set up
on a different drive I was able to blow away all the partitions so I
just created 1 large 1T partition but I think that doesn't deal with
the exact problem you outline.

   I'll have to study how to change the geometry. I do see that cfdisk
is reporting 255/63/121601. Am I to choose a size that __smaller__
than 63 but a multiple of 8? I.e. - 56? And then if I do that does the
partitioning of the drive just ignore those last 7 sectors and reduce
capacity by 56/63 or about 11%?

   Or is it legal to push the number of sectors up to 64? I would have
thought that the sector count would be driven by really low level
formatting and I shouldn't be messing with that.

   Assuming I have done what you are suggesting then with 7
blocks/track then I need to choose the starting positions of each
partition to be aligned to the start of a new 8 sector blocks?

   It's very strange that the disk industry chose anything that's not
2^X but I guess they did.

   As per your and Volker's suggestions I'm going to study the proper
way to align partitions before I do anything more. I did find a small
program called 'fio' that does some interesting drive testing
including seek time testing. I need to study how to really use it
though. It can set up multiple threads to simulate loads that are more
real-world like.

   Thanks to you both for the responses.

Cheers,
Mark



  reply	other threads:[~2010-02-07 21:03 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-07 16:27 [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far Mark Knecht
2010-02-07 17:30 ` Alexander
2010-02-07 18:19   ` Volker Armin Hemmann
2010-02-07 19:26     ` Mark Knecht
2010-02-07 18:38   ` Mark Knecht
2010-02-07 19:16     ` Volker Armin Hemmann
2010-02-07 19:39 ` Willie Wong
2010-02-07 20:31   ` Mark Knecht [this message]
2010-02-07 21:59     ` Kyle Bader
2010-02-07 21:42   ` Mark Knecht
2010-02-08  2:08     ` Willie Wong
2010-02-08 17:10       ` Mark Knecht
2010-02-08 18:52         ` Valmor de Almeida
2010-02-08 20:34           ` Paul Hartman
2010-02-09  0:27             ` Neil Bothwick
2010-02-09 12:46               ` Stroller
2010-02-09 13:34                 ` Neil Bothwick
2010-02-09 23:37                   ` Iain Buchanan
2010-02-10  6:31                     ` Volker Armin Hemmann
2010-02-10  7:11                       ` Iain Buchanan
2010-02-10  8:37                         ` Volker Armin Hemmann
2010-02-10  8:43                         ` Volker Armin Hemmann
2010-02-09 13:35                 ` Volker Armin Hemmann
2010-02-09 13:57                 ` J. Roeleveld
2010-02-09 15:11                   ` Stroller
2010-02-09 15:27                     ` J. Roeleveld
2010-02-09 17:38                       ` Stroller
2010-02-09 18:25                         ` Mark Knecht
2010-02-09 19:29                           ` J. Roeleveld
2010-02-09 15:43                     ` Neil Bothwick
2010-02-09 17:17                       ` Stroller
2010-02-09 20:30                         ` Neil Bothwick
2010-02-09 18:03                       ` Neil Walker
2010-02-09 19:37                         ` J. Roeleveld
2010-02-09 23:52                           ` Iain Buchanan
2010-02-10  1:16                             ` Stroller
2010-02-10  6:59                             ` Neil Walker
2010-02-10  7:31                               ` Iain Buchanan
2010-02-10  1:28                           ` Stroller
2010-02-10 11:14                             ` J. Roeleveld
2010-02-10 16:37                               ` Stroller
2010-02-10 17:26                                 ` J. Roeleveld
2010-02-10 20:48                                   ` Stroller
2010-02-10  0:11                         ` Peter Humphrey
2010-02-10  6:48                           ` Neil Walker
2010-02-09 17:33               ` Paul Hartman
2010-02-09  7:47             ` J. Roeleveld
2010-02-09 23:22               ` Iain Buchanan
2010-02-10  7:08                 ` Alan McKinnon
2010-02-10 10:56                   ` J. Roeleveld
2010-02-10 10:53                 ` J. Roeleveld
2010-02-10 11:03                   ` Volker Armin Hemmann
2010-02-10 11:17                     ` J. Roeleveld
2010-02-10 11:24                       ` Volker Armin Hemmann
2010-02-08  5:25     ` Valmor de Almeida
2010-02-08 19:57       ` Stroller
2010-02-09  0:05     ` Frank Steinmetzger
2010-02-09  0:37       ` Mark Knecht
2010-02-09  2:48         ` Frank Steinmetzger
2010-02-09 17:09           ` Frank Steinmetzger
2010-02-09 18:21             ` Mark Knecht
2010-02-09 21:13             ` Frank Steinmetzger
2010-02-09 22:17               ` J. Roeleveld
2010-02-09 22:54               ` Mark Knecht
2010-02-10  0:31                 ` Iain Buchanan
2010-02-10  1:27                   ` Mark Knecht
2010-02-10  7:06                     ` Iain Buchanan
2010-02-09 16:31         ` Mark Knecht
2010-02-12  9:06           ` Mick
2010-02-12 12:14             ` Mark Knecht
2010-02-09  0:47       ` Stroller
2010-02-09  2:20       ` Willie Wong
2010-02-15  0:48     ` Frank Steinmetzger
2010-02-15  1:17       ` Willie Wong
2010-02-15  3:17         ` Mark Knecht
2010-02-15 18:03         ` Frank Steinmetzger
2010-02-15 23:53           ` Alex Schuster
2010-02-16 17:35             ` Frank Steinmetzger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5bdc1c8b1002071231pd809728y69e4f5e7eede9918@mail.gmail.com \
    --to=markknecht@gmail.com \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox