public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: Mark Knecht <markknecht@gmail.com>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar  performance so far
Date: Sun, 7 Feb 2010 13:42:18 -0800	[thread overview]
Message-ID: <5bdc1c8b1002071342v6c81cf13gde7bcef72be5017b@mail.gmail.com> (raw)
In-Reply-To: <20100207193947.GB30196@math.princeton.edu>

On Sun, Feb 7, 2010 at 11:39 AM, Willie Wong <wwong@math.princeton.edu> wrote:
> On Sun, Feb 07, 2010 at 08:27:46AM -0800, Mark Knecht wrote:
>> <QUOTE>
>> 4KB physical sectors: KNOW WHAT YOU'RE DOING!
>>
>> Pros: Quiet, cool-running, big cache
>>
>> Cons: The 4KB physical sectors are a problem waiting to happen. If you
>> misalign your partitions, disk performance can suffer. I ran
>> benchmarks in Linux using a number of filesystems, and I found that
>> with most filesystems, read performance and write performance with
>> large files didn't suffer with misaligned partitions, but writes of
>> many small files (unpacking a Linux kernel archive) could take several
>> times as long with misaligned partitions as with aligned partitions.
>> WD's advice about who needs to be concerned is overly simplistic,
>> IMHO, and it's flat-out wrong for Linux, although it's probably
>> accurate for 90% of buyers (those who run Windows or Mac OS and use
>> their standard partitioning tools). If you're not part of that 90%,
>> though, and if you don't fully understand this new technology and how
>> to handle it, buy a drive with conventional 512-byte sectors!
>> </QUOTE>
>>
>>    Now, I don't mind getting a bit dirty learning to use this
>> correctly but I'm wondering what that means in a practical sense.
>> Reading the mke2fs man page the word 'sector' doesn't come up. It's my
>> understanding the Linux 'blocks' are groups of sectors. True? If the
>> disk must use 4K sectors then what - the smallest block has to be 4K
>> and I'm using 1 sector per block? It seems that ext3 doesn't support
>> anything larger than 4K?
>
> The problem is not when you are making the filesystem with mke2fs, but
> when you partitioned the disk using fdisk. I'm sure I am making some
> small mistakes in the explanation below, but it goes something like
> this:
>
> a) The harddrive with 4K sectors allows the head to efficiently
> read/write 4K sized blocks at a time.
> b) However, to be compatible in hardware, the harddrive allows 512B
> sized blocks to be addressed. In reality, this means that you can
> individually address the 8 512B-sized chunks of the 4K sized blocks,
> but each will count as a separate operation. To illustrate: say the
> hardware has some sector X of size 4K. It has 8 addressable slots
> inside X1 ... X8 each of size 512B. If your OS clusters read/writes on
> the 512B level, it will send 8 commands to read the info in those 8
> blocks separately. If your OS clusters in 4K, it will send one
> command. So in the stupid analysis I give here, it will take 8 times
> as long for the 512B addressing to read the same data, since it will
> take 8 passes, and each time inefficiently reading only 1/8 of the
> data required. Now in reality, drives are smarter than that: if all 8
> of those are sent in sequence, sometimes the drives will cluster them
> together in one read.
> c) A problem occurs, however, when your OS deals with 4K clusters but
> when you make the partition, the partition is offset! Imagine the
> physical read sectors of your disk looking like
>
> AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD
>
> but when you make your partitions, somehow you partitioned it
>
> ....YYYYYYYYZZZZZZZZWWWWWWWW....
>
> This is possible because the drive allows addressing by 512K chunks.
> So for some reason one of your partitions starts halfway inside a
> physical sector. What is the problem with this? Now suppose your OS
> sends data to be written to the ZZZZZZZZ block. If it were completely
> aligned, the drive will just go kink-move the head to the block, and
> overwrite it with this information. But since half of the block is
> over the BBBB phsical sector, and half over CCCC, what the disk now
> needs to do is to
>
> pass 1) read BBBBBBBB
> pass 2) modify the second half of BBBB to match the first half of ZZZZ
> pass 3) write BBBBBBBB
> pass 4) read CCCCCCCC
> pass 5) modify the first half of CCCC to match the second half of ZZZZ
> pass 6) write CCCCCCCC
>
> Or what is known as a read-modify-write operation. Thus the disk
> becomes a lot less efficient.
>
> ----------
>
> Now, I don't know if this is the actual problem is causing your
> performance problems. But this may be it. When you use fdisk, it
> defaults to aligning the partition to cylinder boundaries, and use the
> default (from ancient times) value of 63 x (512B sized) sectors per
> track. Since 63 is not evenly divisible by 8, you see that quite
> likely some of your partitions are not aligned to the physical sector
> boundaries.
>
> If you use cfdisk, you can try to change the geometry with the command
> g. Or you can use the command u to change the units used in the
> partitioning to either sectors or megabytes, and make sure your
> partition sizes are a multiple of 8 in the former, or an integer in
> the latter.
>
> Again, take what I wrote with a grain of salt: this information came
> from the research I did a little while back after reading the slashdot
> article on this 4K switch. So being my own understanding, it may not
> completely be correct.
>
> HTH,
>
> W
> --
> Willie W. Wong                                     wwong@math.princeton.edu
> Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
>         et vice versa   ~~~  I. Newton
>

Hi Willie,
   OK - it turns out if I start fdisk using the -u option it show me
sector numbers. Looking at the original partition put on just using
default values it had the starting sector was 63 - probably about the
worst value it could be. As a test I blew away that partition and
created a new one starting at 64 instead and the untar results are
vastly improved - down to roughly 20 seconds from 8-10 minutes. That's
roughly twice as fast as the old 120GB SATA2 drive I was using to test
the system out while I debugged this issue.

   There's still some variability but there's probably other things
running on the box - screen savers and stuff - that account for some
of that.

   I'm still a little fuzzy about what happens to the extra sectors at
the end of a track. Are they used and I pay for a little bit of
overhead reading data off of them or are they ignored and I lose
capacity? I think it must be the former as my partition isn't all that
much less than 1TB.

   Again, many thanks to you and Volker for point this issue out.

Cheers,
Mark

gandalf TestMount # fdisk -u /dev/sdb

The number of cylinders for this disk is set to 121601.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk identifier: 0x67929f10

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1              64  1953525167   976762552   83  Linux

Command (m for help): q

gandalf TestMount # df -H
Filesystem             Size   Used  Avail Use% Mounted on
/dev/sda3              110G   8.6G    96G   9% /
udev                    11M   177k    11M   2% /dev
shm                    2.0G      0   2.0G   0% /dev/shm
/dev/sdb1              985G   210M   935G   1% /mnt/TestMount
gandalf TestMount #



gandalf TestMount # mkdir usr
gandalf TestMount # time tar xjf /portage-latest.tar.bz2 -C /mnt/TestMount/usr

real	0m23.275s
user	0m8.614s
sys	0m2.644s
gandalf TestMount # time rm -rf /mnt/TestMount/usr/

real	0m3.720s
user	0m0.118s
sys	0m1.822s
gandalf TestMount # mkdir usr
gandalf TestMount # time tar xjf /portage-latest.tar.bz2 -C /mnt/TestMount/usr

real	0m13.828s
user	0m8.911s
sys	0m2.653s
gandalf TestMount # time rm -rf /mnt/TestMount/usr/

real	0m19.718s
user	0m0.128s
sys	0m2.025s
gandalf TestMount # mkdir usr
gandalf TestMount # time tar xjf /portage-latest.tar.bz2 -C /mnt/TestMount/usr

real	0m25.777s
user	0m8.579s
sys	0m2.660s
gandalf TestMount # time rm -rf /mnt/TestMount/usr/

real	0m2.564s
user	0m0.112s
sys	0m1.805s
gandalf TestMount #



  parent reply	other threads:[~2010-02-07 22:08 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-07 16:27 [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far Mark Knecht
2010-02-07 17:30 ` Alexander
2010-02-07 18:19   ` Volker Armin Hemmann
2010-02-07 19:26     ` Mark Knecht
2010-02-07 18:38   ` Mark Knecht
2010-02-07 19:16     ` Volker Armin Hemmann
2010-02-07 19:39 ` Willie Wong
2010-02-07 20:31   ` Mark Knecht
2010-02-07 21:59     ` Kyle Bader
2010-02-07 21:42   ` Mark Knecht [this message]
2010-02-08  2:08     ` Willie Wong
2010-02-08 17:10       ` Mark Knecht
2010-02-08 18:52         ` Valmor de Almeida
2010-02-08 20:34           ` Paul Hartman
2010-02-09  0:27             ` Neil Bothwick
2010-02-09 12:46               ` Stroller
2010-02-09 13:34                 ` Neil Bothwick
2010-02-09 23:37                   ` Iain Buchanan
2010-02-10  6:31                     ` Volker Armin Hemmann
2010-02-10  7:11                       ` Iain Buchanan
2010-02-10  8:37                         ` Volker Armin Hemmann
2010-02-10  8:43                         ` Volker Armin Hemmann
2010-02-09 13:35                 ` Volker Armin Hemmann
2010-02-09 13:57                 ` J. Roeleveld
2010-02-09 15:11                   ` Stroller
2010-02-09 15:27                     ` J. Roeleveld
2010-02-09 17:38                       ` Stroller
2010-02-09 18:25                         ` Mark Knecht
2010-02-09 19:29                           ` J. Roeleveld
2010-02-09 15:43                     ` Neil Bothwick
2010-02-09 17:17                       ` Stroller
2010-02-09 20:30                         ` Neil Bothwick
2010-02-09 18:03                       ` Neil Walker
2010-02-09 19:37                         ` J. Roeleveld
2010-02-09 23:52                           ` Iain Buchanan
2010-02-10  1:16                             ` Stroller
2010-02-10  6:59                             ` Neil Walker
2010-02-10  7:31                               ` Iain Buchanan
2010-02-10  1:28                           ` Stroller
2010-02-10 11:14                             ` J. Roeleveld
2010-02-10 16:37                               ` Stroller
2010-02-10 17:26                                 ` J. Roeleveld
2010-02-10 20:48                                   ` Stroller
2010-02-10  0:11                         ` Peter Humphrey
2010-02-10  6:48                           ` Neil Walker
2010-02-09 17:33               ` Paul Hartman
2010-02-09  7:47             ` J. Roeleveld
2010-02-09 23:22               ` Iain Buchanan
2010-02-10  7:08                 ` Alan McKinnon
2010-02-10 10:56                   ` J. Roeleveld
2010-02-10 10:53                 ` J. Roeleveld
2010-02-10 11:03                   ` Volker Armin Hemmann
2010-02-10 11:17                     ` J. Roeleveld
2010-02-10 11:24                       ` Volker Armin Hemmann
2010-02-08  5:25     ` Valmor de Almeida
2010-02-08 19:57       ` Stroller
2010-02-09  0:05     ` Frank Steinmetzger
2010-02-09  0:37       ` Mark Knecht
2010-02-09  2:48         ` Frank Steinmetzger
2010-02-09 17:09           ` Frank Steinmetzger
2010-02-09 18:21             ` Mark Knecht
2010-02-09 21:13             ` Frank Steinmetzger
2010-02-09 22:17               ` J. Roeleveld
2010-02-09 22:54               ` Mark Knecht
2010-02-10  0:31                 ` Iain Buchanan
2010-02-10  1:27                   ` Mark Knecht
2010-02-10  7:06                     ` Iain Buchanan
2010-02-09 16:31         ` Mark Knecht
2010-02-12  9:06           ` Mick
2010-02-12 12:14             ` Mark Knecht
2010-02-09  0:47       ` Stroller
2010-02-09  2:20       ` Willie Wong
2010-02-15  0:48     ` Frank Steinmetzger
2010-02-15  1:17       ` Willie Wong
2010-02-15  3:17         ` Mark Knecht
2010-02-15 18:03         ` Frank Steinmetzger
2010-02-15 23:53           ` Alex Schuster
2010-02-16 17:35             ` Frank Steinmetzger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5bdc1c8b1002071342v6c81cf13gde7bcef72be5017b@mail.gmail.com \
    --to=markknecht@gmail.com \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox