From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1NeEI0-0002Gr-6C for garchives@archives.gentoo.org; Sun, 07 Feb 2010 21:03:32 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id CCDE4E0E05 for ; Sun, 7 Feb 2010 21:03:31 +0000 (UTC) Received: from mail-pz0-f199.google.com (mail-pz0-f199.google.com [209.85.222.199]) by pigeon.gentoo.org (Postfix) with ESMTP id B8C2DE09CD for ; Sun, 7 Feb 2010 20:31:25 +0000 (UTC) Received: by pzk37 with SMTP id 37so3465997pzk.10 for ; Sun, 07 Feb 2010 12:31:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=P0WrFIHMEr5Yp1PEybbo5F1odxTZLdubs/mWoDNLe90=; b=E2zpWaGPNK0hcfttjkEWOSm9z976LwpP7kTGR8OlAcH/JCe7UMkBT4kiwuPbkM9Mqm ewYkfM+CgTGuxw8NjBL86i6U+GkXLc3N7mj3vtJAB0Z5w2vKvjyAP/eni0d+lQrHZ3o2 Sv4ycBpvCAYeX+ByMSGmXfYSfEpTP0JoaVUn8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=wBdZU+ZuzUKyPT5yIYJl6MJLj79q682j+LrlXfq+QesinfmEG/VTahMuLfk+0PW6Cw UpmkOUYWBnuVkKWiaE0DAzCA4Af51era1uvfna3AON2sjYK55IoeIRR3moRnTySFd9vR uC60KWwetkjpsgvtIPDQvD5UqLQbFjThddjmg= Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Received: by 10.142.6.9 with SMTP id 9mr3617469wff.257.1265574685090; Sun, 07 Feb 2010 12:31:25 -0800 (PST) In-Reply-To: <20100207193947.GB30196@math.princeton.edu> References: <5bdc1c8b1002070827i14f59047k39a695900ebe9889@mail.gmail.com> <20100207193947.GB30196@math.princeton.edu> Date: Sun, 7 Feb 2010 12:31:24 -0800 Message-ID: <5bdc1c8b1002071231pd809728y69e4f5e7eede9918@mail.gmail.com> Subject: Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far From: Mark Knecht To: gentoo-user@lists.gentoo.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Archives-Salt: 9cdf356e-6c91-4b7e-a7f6-1ac170e64877 X-Archives-Hash: 38f558b94b2cfb3fd00c687648a9929a On Sun, Feb 7, 2010 at 11:39 AM, Willie Wong wro= te: > On Sun, Feb 07, 2010 at 08:27:46AM -0800, Mark Knecht wrote: >> >> 4KB physical sectors: KNOW WHAT YOU'RE DOING! >> >> Pros: Quiet, cool-running, big cache >> >> Cons: The 4KB physical sectors are a problem waiting to happen. If you >> misalign your partitions, disk performance can suffer. I ran >> benchmarks in Linux using a number of filesystems, and I found that >> with most filesystems, read performance and write performance with >> large files didn't suffer with misaligned partitions, but writes of >> many small files (unpacking a Linux kernel archive) could take several >> times as long with misaligned partitions as with aligned partitions. >> WD's advice about who needs to be concerned is overly simplistic, >> IMHO, and it's flat-out wrong for Linux, although it's probably >> accurate for 90% of buyers (those who run Windows or Mac OS and use >> their standard partitioning tools). If you're not part of that 90%, >> though, and if you don't fully understand this new technology and how >> to handle it, buy a drive with conventional 512-byte sectors! >> >> >> =C2=A0 =C2=A0Now, I don't mind getting a bit dirty learning to use this >> correctly but I'm wondering what that means in a practical sense. >> Reading the mke2fs man page the word 'sector' doesn't come up. It's my >> understanding the Linux 'blocks' are groups of sectors. True? If the >> disk must use 4K sectors then what - the smallest block has to be 4K >> and I'm using 1 sector per block? It seems that ext3 doesn't support >> anything larger than 4K? > > The problem is not when you are making the filesystem with mke2fs, but > when you partitioned the disk using fdisk. I'm sure I am making some > small mistakes in the explanation below, but it goes something like > this: > > a) The harddrive with 4K sectors allows the head to efficiently > read/write 4K sized blocks at a time. > b) However, to be compatible in hardware, the harddrive allows 512B > sized blocks to be addressed. In reality, this means that you can > individually address the 8 512B-sized chunks of the 4K sized blocks, > but each will count as a separate operation. To illustrate: say the > hardware has some sector X of size 4K. It has 8 addressable slots > inside X1 ... X8 each of size 512B. If your OS clusters read/writes on > the 512B level, it will send 8 commands to read the info in those 8 > blocks separately. If your OS clusters in 4K, it will send one > command. So in the stupid analysis I give here, it will take 8 times > as long for the 512B addressing to read the same data, since it will > take 8 passes, and each time inefficiently reading only 1/8 of the > data required. Now in reality, drives are smarter than that: if all 8 > of those are sent in sequence, sometimes the drives will cluster them > together in one read. > c) A problem occurs, however, when your OS deals with 4K clusters but > when you make the partition, the partition is offset! Imagine the > physical read sectors of your disk looking like > > AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD > > but when you make your partitions, somehow you partitioned it > > ....YYYYYYYYZZZZZZZZWWWWWWWW.... > > This is possible because the drive allows addressing by 512K chunks. > So for some reason one of your partitions starts halfway inside a > physical sector. What is the problem with this? Now suppose your OS > sends data to be written to the ZZZZZZZZ block. If it were completely > aligned, the drive will just go kink-move the head to the block, and > overwrite it with this information. But since half of the block is > over the BBBB phsical sector, and half over CCCC, what the disk now > needs to do is to > > pass 1) read BBBBBBBB > pass 2) modify the second half of BBBB to match the first half of ZZZZ > pass 3) write BBBBBBBB > pass 4) read CCCCCCCC > pass 5) modify the first half of CCCC to match the second half of ZZZZ > pass 6) write CCCCCCCC > > Or what is known as a read-modify-write operation. Thus the disk > becomes a lot less efficient. > > ---------- > > Now, I don't know if this is the actual problem is causing your > performance problems. But this may be it. When you use fdisk, it > defaults to aligning the partition to cylinder boundaries, and use the > default (from ancient times) value of 63 x (512B sized) sectors per > track. Since 63 is not evenly divisible by 8, you see that quite > likely some of your partitions are not aligned to the physical sector > boundaries. > > If you use cfdisk, you can try to change the geometry with the command > g. Or you can use the command u to change the units used in the > partitioning to either sectors or megabytes, and make sure your > partition sizes are a multiple of 8 in the former, or an integer in > the latter. > > Again, take what I wrote with a grain of salt: this information came > from the research I did a little while back after reading the slashdot > article on this 4K switch. So being my own understanding, it may not > completely be correct. > > HTH, > > W > -- > Willie W. Wong =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 wwong= @math.princeton.edu > Data aequatione quotcunque fluentes quantitae involvente fluxiones inveni= re > =C2=A0 =C2=A0 =C2=A0 =C2=A0 et vice versa =C2=A0 ~~~ =C2=A0I. Newton > > Willie, Thanks. Your description above is pretty much consistent (I think) with the information I found at the WD site explaining how the data is being physically packed on the drive. Being that I have the OS set up on a different drive I was able to blow away all the partitions so I just created 1 large 1T partition but I think that doesn't deal with the exact problem you outline. I'll have to study how to change the geometry. I do see that cfdisk is reporting 255/63/121601. Am I to choose a size that __smaller__ than 63 but a multiple of 8? I.e. - 56? And then if I do that does the partitioning of the drive just ignore those last 7 sectors and reduce capacity by 56/63 or about 11%? Or is it legal to push the number of sectors up to 64? I would have thought that the sector count would be driven by really low level formatting and I shouldn't be messing with that. Assuming I have done what you are suggesting then with 7 blocks/track then I need to choose the starting positions of each partition to be aligned to the start of a new 8 sector blocks? It's very strange that the disk industry chose anything that's not 2^X but I guess they did. As per your and Volker's suggestions I'm going to study the proper way to align partitions before I do anything more. I did find a small program called 'fio' that does some interesting drive testing including seek time testing. I need to study how to really use it though. It can set up multiple threads to simulate loads that are more real-world like. Thanks to you both for the responses. Cheers, Mark