public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: Richard Yao <ryao@gentoo.org>
To: Jaco Kroon <jaco@uls.co.za>, gentoo-dev@lists.gentoo.org
Subject: ext4 readdir performance - was Re: [gentoo-dev] New distfile mirror layout
Date: Wed, 23 Oct 2019 19:47:25 -0400	[thread overview]
Message-ID: <c23b3d2d-6164-d1ad-022c-560e203d7047@gentoo.org> (raw)
In-Reply-To: <73f461e5-d224-6aec-48be-f7e0cf8e077f@uls.co.za>


[-- Attachment #1.1: Type: text/plain, Size: 5971 bytes --]

On 10/22/19 2:51 AM, Jaco Kroon wrote:
> Hi All,
> 
> 
> On 2019/10/21 18:42, Richard Yao wrote:
>>
>> If we consider the access frequency, it might actually not be that
>> bad. Consider a simple example with 500 files and two directory
>> buckets. If we have 250 in each, then the size of the directory is
>> always 250. However, if 50 files are accessed 90% of the time, then
>> putting 450 into one directory and that 50 into another directory, we
>> end up with the performance of the O(n) directory lookup being
>> consistent with there being only 90 files in each directory.
>>
>> I am not sure if we should be discarding all other considerations to
>> make changes to benefit O(n) directory lookup filesystems, but if we
>> are, then the hashing approach is not necessarily the best one. It is
>> only the best when all files are accessed with equal frequency, which
>> would be an incorrect assumption. A more human friendly approach might
>> still be better. I doubt that we have the data to determine that though.
>>
>> Also, another idea is to use a cheap hash function (e.g. fletcher) and
>> just have the mirrors do the hashing behind the scenes. Then we would
>> have the best of both worlds.
> 
> 
> Experience:
> 
> ext4 sucks at targeting name lookups without dir_index feature (O(n)
> lookups - scans all entries in the folder).  With dir_index readdir
> performance is crap.  Pick your poison I guess.  Most of our larger
> filesystems (2TB+, but especially the 80TB+ ones) we've reverted to
> disabling dir_index as the benefit is outweighed by the crappy readdir()
> and glob() performance.
My read of the ext4 disk layout documentation is that the read operation
should work mostly the same way, except with a penalty from reading
larger directories caused by the addition of the tree's metadata and
from having more partially filled blocks:

https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Directory_Entries

The code itself is the same traversal code:

https://github.com/torvalds/linux/blob/v5.3/fs/ext4/dir.c#L106

However, a couple of things stand out to me here at a glance:

1. `cond_resched()` adds scheduler delay for no apparent reason.
`cond_resched()` is meant to be used in places where we could block
excessively on non-PREEMPT kernels, but this doesn't strike me as one of
those places. The fact that we block on disk on uncached reads naturally
serves the same purpose, so an explicit rescheduling point here is
redundant. PREEMPT kernels should perform better in readdir() on ext4 by
virtue of making `cond_resched()` a no-op.

2. read-ahead is implemented in a way that appears to be over-reading
the directory whenever the needed information is not cached. This is
technically read-ahead, although it is not a great way of doing it. A
much better way to do this would be to pipeline `readdir()` by
initiating asynchronous read operations in anticipation of future reads.

Both of thse should affect both variants of ext4's directories, but the
penalties I mentioned earlier mean that the dir_index variant would be
affected more.

If you have a way to benchmark things, a simple idea to evaluate would
be deleting the `cond_resched()` line. If we had data showing an
improvement, I would be happy to send a small one-line patch deleting
the line to Ted to get the change into mainline.

> There doesn't seem to be a real specific tip-over point, and it seems to
> depend a lot on RAM availability and harddrive speed (obviously).  So if
> dentries gets cached, disk speeds becomes less of an issue.  However, on
> large folders (where I typically use 10k as a value for large based on
> "gut feeling" and "unquantifiable experience" and "nothing scientific at
> all") I find that even with lots of RAM two consecutive ls commands
> remains terribly slow. Switch off dir_index and that becomes an order of
> magnitude faster.
> 
> I don't have a great deal of experience with XFS, but on those systems
> where we do it's generally on a VM, and perceivably (again, not
> scientific) our experience has been that it feels slower.  Again, not
> scientific, just perception.
> 
> I'm in support for the change.  This will bucket to 256 folders and
> should have a reasonably even split between folders.  If required a
> second layer could be introduced by using the 3rd and 4th digits of the
> hash for a second layer.  Any hash should be fine, it really doesn't
> need to be cryptographically strong, it just needs to provide a good
> spread and be really fast.  Generally a hash table should have a prime
> number of buckets to assist with hash bias, but frankly, that's over
> complicating the situation here.
> 
> I also agree with others that it used to be easy to get distfiles as and
> when needed, so an alternative structure could mirror that of the
> portage tree itself, in other words "cat/pkg/distfile". This perhaps
> just shifts the issue:
> 
> jkroon@plastiekpoot /usr/portage $ find . -maxdepth 1 -type d -name
> "*-*" | wc -l
> 167
> jkroon@plastiekpoot /usr/portage $ find *-* -maxdepth 1 -type d | wc -l
> 19412
> jkroon@plastiekpoot /usr/portage $ for i in *-*; do echo $(find $i
> -maxdepth 1 -type d | wc -l) $i; done | sort -g | tail -n10
> 347 net-misc
> 373 media-sound
> 395 media-libs
> 399 dev-util
> 505 dev-libs
> 528 dev-java
> 684 dev-haskell
> 690 dev-ruby
> 1601 dev-perl
> 1889 dev-python
> 
> So that's average 116 sub folders under the top layer (only two over
> 1000), and then presumably less than 100 distfiles maximum per package? 
> Probably overkill but would (should) solve both the too many files per
> folder as well as the easy lookup by hand issue.
> 
> I don't have a preference on either solution though but do agree that
> "easy finding of distfiles" are handy.  The INDEX mechanism is fine for me.
> 
> Kind Regards,
> 
> Jaco
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2019-10-23 23:47 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-18 13:41 [gentoo-dev] New distfile mirror layout Michał Górny
2019-10-18 19:53 ` Richard Yao
2019-10-18 20:49   ` Michał Górny
2019-10-19  1:09     ` Richard Yao
2019-10-19  6:17       ` Michał Górny
2019-10-19  8:20         ` Richard Yao
2019-10-19 19:26       ` Richard Yao
2019-10-19 20:02         ` Michał Górny
2019-10-19 22:48           ` Richard Yao
2019-10-22  0:46   ` James Cloos
2019-10-19 13:31 ` Fabian Groffen
2019-10-19 13:53   ` Michał Górny
2019-10-19 23:24 ` Joshua Kinard
2019-10-19 23:57   ` Alec Warner
2019-10-20  0:14     ` Joshua Kinard
2019-10-20  6:51   ` Michał Górny
2019-10-20  8:25     ` Joshua Kinard
2019-10-20  8:32       ` Michał Górny
2019-10-20  9:21         ` Joshua Kinard
2019-10-20  9:44           ` Michał Górny
2019-10-20 20:57             ` Joshua Kinard
2019-10-21  0:05               ` Joshua Kinard
2019-10-21  5:51                 ` Ulrich Mueller
2019-10-21 10:17                 ` Kent Fredric
2019-10-21 21:34                 ` Mikle Kolyada
2019-10-21 10:13               ` Kent Fredric
2019-10-23  5:16                 ` Joshua Kinard
2019-10-29 16:35                   ` Kent Fredric
2019-10-20 17:09       ` Matt Turner
2019-10-21 16:42     ` Richard Yao
2019-10-21 23:36       ` Matt Turner
2019-10-23  5:18         ` Joshua Kinard
2019-10-23 17:06           ` William Hubbs
2019-10-23 18:38             ` William Hubbs
2019-10-23 22:04           ` William Hubbs
2019-10-24  4:30             ` Michał Górny
2019-10-22  6:51       ` Jaco Kroon
2019-10-22  8:43         ` Ulrich Mueller
2019-10-22  8:46           ` Jaco Kroon
2019-10-23 23:47         ` Richard Yao [this message]
2019-10-24  0:01           ` ext4 readdir performance - was " Richard Yao
2019-10-23  1:21       ` Rich Freeman
2019-10-28 23:24     ` Chí-Thanh Christopher Nguyễn
2019-10-29  4:27       ` Michał Górny
2019-10-29  9:34         ` Fabian Groffen
2019-10-29 11:11           ` Michał Górny
2019-10-29 12:23             ` Ulrich Mueller
2019-10-29 12:43               ` Michał Górny
2019-10-29 13:03                 ` Ulrich Mueller
2019-10-29 13:09                   ` Ulrich Mueller
2019-10-29 13:52                     ` Michał Górny
2019-10-29 14:17                       ` Ulrich Mueller
2019-10-29 14:33                         ` Fabian Groffen
2019-10-29 14:45                           ` Michał Górny
2019-10-29 14:56                             ` Fabian Groffen
2019-10-29 13:51                   ` Michał Górny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c23b3d2d-6164-d1ad-022c-560e203d7047@gentoo.org \
    --to=ryao@gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    --cc=jaco@uls.co.za \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox