From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 12D49139118 for ; Wed, 23 Oct 2019 23:47:46 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id F2481E0866; Wed, 23 Oct 2019 23:47:31 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 59385E0857 for ; Wed, 23 Oct 2019 23:47:31 +0000 (UTC) Received: from [IPv6:2001:470:1f07:93::2] (home.zettabytesoftware.com [IPv6:2001:470:1f07:93::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: ryao) by smtp.gentoo.org (Postfix) with ESMTPSA id D5EC134C2A6; Wed, 23 Oct 2019 23:47:29 +0000 (UTC) Subject: ext4 readdir performance - was Re: [gentoo-dev] New distfile mirror layout To: Jaco Kroon , gentoo-dev@lists.gentoo.org References: <752be6c75f337df8ee8124a804247d2fb27e73b4.camel@gentoo.org> <73f461e5-d224-6aec-48be-f7e0cf8e077f@uls.co.za> From: Richard Yao Openpgp: preference=signencrypt Autocrypt: addr=ryao@gentoo.org; prefer-encrypt=mutual; keydata= mQINBE+1VSwBEACt340ZNODNjIC72bA4R53yyrQM/VwBjV9wBMggPjI9Mrg7t81k9q08Flq/ IshZYHyP5W2al/I+VQ0zWu/agOOhkTsP9wF+pT9Ti8qtzzENj2o7steziVWBTFrE10USgtaB vlWXTMG0zIFxJC1xy4Q2X/oTbvKxuZt1IhWxyWddKwWWCFHJqQdoiqdTjvFEIGNuYwqKxulD dyyDPiJcQztablyE9JL8F403ma4ZlJXsUaMQY7s+8IKzfU7QSzISEZhui7+JOdEPpauGC1u6 7V25QsXczYSQ9Q3/VtSku0pfqc43tBF535Q/z6RtdsllB2xsv3M0RUnRQCaUo+dVCmJPrVun hgZv8+sWKk8J3NP1plZD60Snnr89Y/09NvUMueLRs5KfbWdsqbot+8ZwCgOVSBkRlrhQPoI0 by3NOvsPd454+JwsZUGTiK5WD71xjN/MFSafRWcXB7xKNNSSbs67bAD9lDc+sxJNuJ9sLTca uL4YlVIjNxF6riNgLKaMq8TZfh4W0iOsB+RoA+H2/BFtFL4/xT2zInWIEkkTFahPrfPmRNl6 pIO5+4sxOVMtq754hudZZYysxqAnTUJ7F/+BMfQJD6b3HON7AfnDm1RVhfG9F0k/TzeaeLjt jAezJ6jrUdqgXuGvHeOIILTeenGr1BsdwltW3Q1c8zjvAaJsywARAQABtDBSaWNoYXJkIFlh byAoR2VudG9vIERldmVsb3BlcikgPHJ5YW9AZ2VudG9vLm9yZz6JAlgEEwEIAEICGyMGCwkI BwMCBhUIAgkKCwQWAgMBAh4BAheAAhkBFiEEhCJwq0qcq+M9DlexIO4Rmb7oTGQFAlyDGkUF CRFwSxkACgkQIO4Rmb7oTGTGTA//d/+Ow/MwXYBi5+KtlU6kRqIoXgM8QgwapfgtJFRZkr4h 8snuKNp3iU8NiW8ZiZLaQpSHOf7O4m2kaG02bFfkqtIzicCWtN0WV26UoSB0mnqniXWEAt6b yabNSUMjr3SRwc39jOGfnAJsblf3VyZF7lv3XBmND4ai+IxTg41UsXLydoZtjhs5NtWWhHJl UXg0VqC45GhlwCAIQ4+37dQMF4Fdso95bGROtSSfWhBE+vY45WEKGBoArZJBNugJ6v15oGi4 dLbrz8plSoCgkfNDI5jxZyQbxOlEDRfaacG1ALnjxMpWpuaiv/yNtH0w6JuH8uTAxoif+m3p /IRUKYZMakFl+pjYuli61ZpeXcAQl9AWfC3E1d/KeBZPRqDK/H54V25gEFWgPOdcnz+hbCmH gy0KKFFeEJy//5ckDjtY5o4gMDPNk0tlfIBQQss8zcv40P0O06YPdo1GTjeMVmMQiME7JTqq apkuFUVwzggw2R2loQZL09q6y3GjNrSUjVL0licuAZgQzQNs1B7AyZoWJmMdZr+wH5dHpYhZ hv3rl3HiPqYPEafiXs9i3q4/k/wyscWPArIgBe/AJgmFv79cPilHdFAZXDJlbbTIWmybunCr GwfC6JhonzKb41mnVQ31VSQXxBZla/ZE5SqV/rzUJUVSPjkbHh+je8M8rIBe+Ge5Ag0ET7VV LAEQALV7FjFWd5vkMfkwBqawlwnUPzw+mf4TTZqf8oulVxZQ0OodOi8t61VxA7CDpSIGZ+Jp 3z5TIeFgY93pVHBfW5FPVdpchNgSsd0QNKc2rVCzKYxAMzhrMPj6RFwBxs6xtTd0+PmGM5D7 5HxbfikCAoAqd/ChwQRj8oOZrQqO4Svk5dBQQylEkx7aVHUc6NQ9WXf50XUMAEOyhX/WP/bQ 14aghh2OD/olLh06S1pQbFfqAZyCBsW+edluEbHUt1RiPmnPToz59bgcRnccbgKEctrF7ubx BouVVMYjzytsPZHa+5OjrQ6owsBtR/J2yA473nXiX1PH/gB9CstUqn8WrgbPmR2EF/IU0okO LbW/aunZXR7WPBq/M4tC1g/7ex478BHarLoG/Wti5xHfpJzYfSTEhZ3gP3DnQ38TNePvtznC cifTkapfUCSOgX4JjqNE4Z8SQ20hCveJBUD3g5BFX4Y3mLfsRqEgrDLusyfNRHN6uDnC+M3S C+vSNqqcCAkOJSEDwgaEjmcIiW4dX+Snn8M3CJ+vckKRQyue6AnYAjZ+OkraABgPlhogLBJ0 KipxFvs4sPHnTBcqMD6kw3c/boFUHDBX6OtAFkKfYUjXV8fo6r8rNN17BeAT3pbtaI4lzluw 5nB2O0cNAOzlcIX8KoXzeyfwhA6vsyggCmhsNd0RABEBAAGJAjwEGAEIACYCGwwWIQSEInCr Spyr4z0OV7Eg7hGZvuhMZAUCXIMaWAUJEXBLLAAKCRAg7hGZvuhMZCMsD/4vSM0WThk4uc4V 1lf8RHprOB4yAqVyktesNQQHdZ42OgsHtFOFG4JJSeWm5OkFBmZwodg3pfo6WULGtG1NQsc1 994TFhOt+J0coiDlhPGhWhG1Dqf9J84lSVWiiGhXds4KG8zcqnRVcylEhhkoy0KPxCiJiBSH boyVi/WS+OzJitaoH+t9RnoZz4ssMr679HgGI/6dlfmhlLzTfRImG8aPM7w7KktnXuFMHd4N KgKE4S1uoOA6CNlomId6CeHeS/wNrXG7uD7l8LC8TZKoC1wEpUivjG+9sIsX3JntQV2C1iis azUidxyZ4FUAbtMbQHMZr3SQOQ3vRQTcrUDeLmKsx/AJCILS+eFpe7vr9VmH1ERaaEh1M8sb E6HyEc5odAQc0kxpCWjy6BdMbX77iVvBpvlPTXpsTPOwC/FFq77u9rUfiJCKIToVUkNrnppi PROIAPbunvXGp3GavQmfkVe+KS3XVmZeOxnsIclU83hEhjRWcf3oPgnJibuG8/dLsdyWGRnl N4B73fR5CBXbNmTpU83nI1H3EsFh42o4bg2Q/r87yKT/SdCQB+V/wxBpSGNOuBiRblaAaEBf YXn1eqnjvaCyHProF5f1gXbx0qC4uwair+g/pzQS8Xy92eSC0ZVebO8BU9B7n5Zl11LmshjA 4+oKPCPSiy1CWI9TbAJSSLkCDQRbP8y3ARAAr0TeCROkR46LiJzzWHUmMmM2HeJfVAs+DN68 t4Aqxnd364QEtY1RgI6RiYIfhWfJeJgcaq4BH73ZbqIIEE14EHCyFgSA/36sXfbjx6Y6ILm+ lUzm2cCVZhORz+ykbAc2mrhDdaVrWcrtZuTn8yyev46/v7X5ddAWwymrUEG66IIJUQgv2+5U VOZ+L6USOC1oZz9pLgfqCek5QfzD1LuGIVyRSa2wf/AkG+yP+BiqyGaoJiodJ7y0+0Ym+UkT ajhy3bmQSz+tnmO/TVEldz0zxLZ4hkGvofOL9SCAEj8f+CUyp2yeTOSoJ77ULiMFnfVwpIZW aTOuBpPOGo5Grz06FERTgqD1/5rdCv1QjpNVp7dS5GypM57dmqlEEwIP7N6s9c3PFKbdW9mG C4CajtQauOJUFeK5Mt7rX+im/xz2OL4F3W9wVrO70+Zeps7ReDX268RFxR+mXYPFfHNgrvVN HCROxlZS0A6+s8NQaEuOWrAlloBGxteQePtaaScX2hOe/sVV1L0lxxFkI7uS7oFv0bl8SXpw 2rAzh0ogORVabXZQYQA+8NuhlXk6+gvf6BREEoRLsNGVuHqVxEkq1jbwnNKx+EPiTSSL0n9g FLJ2nRYaQGZXswRVN2Co60bJ9fd6WinoBwT3fqugOWZPttyhuuV9rutUW4io55XkMUvziWUA EQEAAYkE0gQYAQgAJgIbAhYhBIQicKtKnKvjPQ5XsSDuEZm+6ExkBQJcgxpYBQkF5dOhAqDB 1CAEGQEIAH0WIQSvA6/1FGiiNjyi4LX9u2Mmk152WgUCWz/Mt18UgAAAAAAuAChpc3N1ZXIt ZnByQG5vdGF0aW9ucy5vcGVucGdwLmZpZnRoaG9yc2VtYW4ubmV0QUYwM0FGRjUxNDY4QTIz NjNDQTJFMEI1RkRCQjYzMjY5MzVFNzY1QQAKCRD9u2Mmk152WkhKD/9o8sLLS5sLdrr9T2hs rG+i2XLkWtqoDl8WX+aKeQWBLEjFe4M4oBHcFljlmTioUxNTWAlNAe6FuK/tZLPbv1VF3mrL f0IM4uakltQ/L0S//oPsM47hIBfUsGEPJL7lyYaxONTKLuNb/uo53kMWaiao2aqRmdzVT/UE Rd2dRBDLrMTnMZccM4GXLph4X1gkquFieJ9iUT/RuLTjUwutnGt4BQns/e72kQrY0HqMwoXD xjNmHBvAYQUO9hypZDvG1+hPMfsrP9UEllO0oHhZ/noGYe9RaJNc005Nkc/dAf8x4Cp4t9Bx rHx+YoZTpkhopoy3EL9J0xXtxSSEokJgYKyR5Gup+IgpwLTJLzZJ7260K+kdeK6yF5gelt3y JhMjiHmVzgqpMR2ihglepYI91+yRmh/yLFybXIvo/rlYm87rzq2CwmmRxVIu6PiCQ6PL6vbG qtE6CV5vOJXLD9cLaZvG6Ia4bTfkZwlDSQgdXaDRplrWjnl+8piKjtCIdPgGOnhHMQJeDnYm Yes0c67VPlyIErTICFzv5npzROVsOuDYPnALEZ62G90huq6c2LCtmxYY7QH/lKtY2B3I21Yo 0oh56b5ZMAU4MA5UW1OE745Ew2qXRj/ZCOqdrEdPfjlkWJb4TNk6Q5vBIwGyzpSHUOiUFSYQ LzKhURmDC18hKu0NdQkQIO4Rmb7oTGQqaw/9EW1X7pGdh06j++aOmNd5KpAciOaNYWLwijgb +MKAKu1S+VkyULFpl4SBrcmXcYQ4FbuaDnhhua7UZPbb1WI3kOwVJQTcT0CIk1+06YhnDeLB 3y4S3OmeTtT27v8aBCG67TxfrWD1h8j+7ZtQGepEo83V415HtAn3SthqJiVdBJPIS9YvfdAX Ej5KUg9tdirdVAEvoU22wmRCXqG6m8HJ2kXZEvnoKGyPemzqJxlMPnW6pF33EMvDHrxtc2US EkaYCb2ssxMGdeTZnN4xbWS3b8z9Yu+fAYZ55W0sQrqPcSFz33Ga0U8oX8b1gke7maD6S8PD xveTlv1IH50s57SBctBEmoHmPcl045vYZXYLWdjpYz+mzSJVDUMZgmzeOyHMK7+iYOdSWZqi qckCGClIE4hYs9rAsoClzcySpWfAyrJDJrtDJCZoHmH4aeadA/hvsQzuIs1blgAywtufq/67 fR2dVZDU//F004J81GF2caAIItI9EWP9UQ5vhfzHoQB33yDCRlTXpvIAkOnRRpJJDGtMS8J6 AUSpH/8ZRLkItdv5E003f6vKaUVKotg6AotZ7XEAhZGFp6TGVK95T7ULhFIGrl+VVYv2+UbO 4tNFZNclxzMBbh5KrvgNuIqb2OM1BLsF10wJCcE2td14SuoqvdFjQ7ySirPyhFWlmuVczRU= Message-ID: Date: Wed, 23 Oct 2019 19:47:25 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 In-Reply-To: <73f461e5-d224-6aec-48be-f7e0cf8e077f@uls.co.za> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="qwGc6plWUayvsJmPnOQBTavfa2GPcPjVi" X-Archives-Salt: dbacda68-d35a-4346-93a4-420f98e1487f X-Archives-Hash: 6c33552a873dffe7cc0778cf56e95968 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --qwGc6plWUayvsJmPnOQBTavfa2GPcPjVi Content-Type: multipart/mixed; boundary="o5xv94D6kxdUlLGaLgnGyb1XGxWzaFvSX"; protected-headers="v1" From: Richard Yao To: Jaco Kroon , gentoo-dev@lists.gentoo.org Message-ID: Subject: ext4 readdir performance - was Re: [gentoo-dev] New distfile mirror layout References: <752be6c75f337df8ee8124a804247d2fb27e73b4.camel@gentoo.org> <73f461e5-d224-6aec-48be-f7e0cf8e077f@uls.co.za> In-Reply-To: <73f461e5-d224-6aec-48be-f7e0cf8e077f@uls.co.za> --o5xv94D6kxdUlLGaLgnGyb1XGxWzaFvSX Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: quoted-printable On 10/22/19 2:51 AM, Jaco Kroon wrote: > Hi All, >=20 >=20 > On 2019/10/21 18:42, Richard Yao wrote: >> >> If we consider the access frequency, it might actually not be that >> bad. Consider a simple example with 500 files and two directory >> buckets. If we have 250 in each, then the size of the directory is >> always 250. However, if 50 files are accessed 90% of the time, then >> putting 450 into one directory and that 50 into another directory, we >> end up with the performance of the O(n) directory lookup being >> consistent with there being only 90 files in each directory. >> >> I am not sure if we should be discarding all other considerations to >> make changes to benefit O(n) directory lookup filesystems, but if we >> are, then the hashing approach is not necessarily the best one. It is >> only the best when all files are accessed with equal frequency, which >> would be an incorrect assumption. A more human friendly approach might= >> still be better. I doubt that we have the data to determine that thoug= h. >> >> Also, another idea is to use a cheap hash function (e.g. fletcher) and= >> just have the mirrors do the hashing behind the scenes. Then we would >> have the best of both worlds. >=20 >=20 > Experience: >=20 > ext4 sucks at targeting name lookups without dir_index feature (O(n) > lookups - scans all entries in the folder).=C2=A0 With dir_index readdi= r > performance is crap.=C2=A0 Pick your poison I guess.=C2=A0 Most of our = larger > filesystems (2TB+, but especially the 80TB+ ones) we've reverted to > disabling dir_index as the benefit is outweighed by the crappy readdir(= ) > and glob() performance. My read of the ext4 disk layout documentation is that the read operation should work mostly the same way, except with a penalty from reading larger directories caused by the addition of the tree's metadata and from having more partially filled blocks: https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Directory_Entries= The code itself is the same traversal code: https://github.com/torvalds/linux/blob/v5.3/fs/ext4/dir.c#L106 However, a couple of things stand out to me here at a glance: 1. `cond_resched()` adds scheduler delay for no apparent reason. `cond_resched()` is meant to be used in places where we could block excessively on non-PREEMPT kernels, but this doesn't strike me as one of those places. The fact that we block on disk on uncached reads naturally serves the same purpose, so an explicit rescheduling point here is redundant. PREEMPT kernels should perform better in readdir() on ext4 by virtue of making `cond_resched()` a no-op. 2. read-ahead is implemented in a way that appears to be over-reading the directory whenever the needed information is not cached. This is technically read-ahead, although it is not a great way of doing it. A much better way to do this would be to pipeline `readdir()` by initiating asynchronous read operations in anticipation of future reads. Both of thse should affect both variants of ext4's directories, but the penalties I mentioned earlier mean that the dir_index variant would be affected more. If you have a way to benchmark things, a simple idea to evaluate would be deleting the `cond_resched()` line. If we had data showing an improvement, I would be happy to send a small one-line patch deleting the line to Ted to get the change into mainline. > There doesn't seem to be a real specific tip-over point, and it seems t= o > depend a lot on RAM availability and harddrive speed (obviously).=C2=A0= So if > dentries gets cached, disk speeds becomes less of an issue.=C2=A0 Howev= er, on > large folders (where I typically use 10k as a value for large based on > "gut feeling" and "unquantifiable experience" and "nothing scientific a= t > all") I find that even with lots of RAM two consecutive ls commands > remains terribly slow. Switch off dir_index and that becomes an order o= f > magnitude faster. >=20 > I don't have a great deal of experience with XFS, but on those systems > where we do it's generally on a VM, and perceivably (again, not > scientific) our experience has been that it feels slower.=C2=A0 Again, = not > scientific, just perception. >=20 > I'm in support for the change.=C2=A0 This will bucket to 256 folders an= d > should have a reasonably even split between folders.=C2=A0 If required = a > second layer could be introduced by using the 3rd and 4th digits of the= > hash for a second layer.=C2=A0 Any hash should be fine, it really doesn= 't > need to be cryptographically strong, it just needs to provide a good > spread and be really fast.=C2=A0 Generally a hash table should have a p= rime > number of buckets to assist with hash bias, but frankly, that's over > complicating the situation here. >=20 > I also agree with others that it used to be easy to get distfiles as an= d > when needed, so an alternative structure could mirror that of the > portage tree itself, in other words "cat/pkg/distfile". This perhaps > just shifts the issue: >=20 > jkroon@plastiekpoot /usr/portage $ find . -maxdepth 1 -type d -name > "*-*" | wc -l > 167 > jkroon@plastiekpoot /usr/portage $ find *-* -maxdepth 1 -type d | wc -l= > 19412 > jkroon@plastiekpoot /usr/portage $ for i in *-*; do echo $(find $i > -maxdepth 1 -type d | wc -l) $i; done | sort -g | tail -n10 > 347 net-misc > 373 media-sound > 395 media-libs > 399 dev-util > 505 dev-libs > 528 dev-java > 684 dev-haskell > 690 dev-ruby > 1601 dev-perl > 1889 dev-python >=20 > So that's average 116 sub folders under the top layer (only two over > 1000), and then presumably less than 100 distfiles maximum per package?= =C2=A0 > Probably overkill but would (should) solve both the too many files per > folder as well as the easy lookup by hand issue. >=20 > I don't have a preference on either solution though but do agree that > "easy finding of distfiles" are handy.=C2=A0 The INDEX mechanism is fin= e for me. >=20 > Kind Regards, >=20 > Jaco >=20 --o5xv94D6kxdUlLGaLgnGyb1XGxWzaFvSX-- --qwGc6plWUayvsJmPnOQBTavfa2GPcPjVi Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEErwOv9RRoojY8ouC1/btjJpNedloFAl2w5o0ACgkQ/btjJpNe dlpBhg//Z90VYNBvB5vHVnc9Z9beR6H9QTyr+x+JCjnRqHmleEZ9tQfbUC0bihsi csPvf22PyuC6fsKEmxGMLJG3zqZHi0kp6TLU2TUZl8AxrEQVcYnj2BlkcT91ZkQ5 +TPD9Czs/3wO132Fopav07wjV8tXdgsbSedQFwbfHCyFZFKDn18GIuCJem3fGXpM P3F067NsngZ9bqvSXqeX0MQApHJs2aAnPMgOx6c2jO2iqP2trVDuI4FXEFONJpVX APzy8CRFkhVojrDG2QWtZQLXXo0wPWaK8PMEUHL57khYLqlcRJ+kyUjDVhL27b+J MjNpTnP5I8L+3Ms3XTk1PZV3Vep13YYnrzbESHpVczwkUJOcDcM49sKr7jcFgIMo ZlM5FEpD8lTMVLH11uQ+5chaNqXI+a2lMvxNCTT/DMg1e9QJw1TGPVSAuD2tO3jZ trfbkWFbOlRtHQpwrkyhxUuDOIzQo7tqT5+32gymoL2JBzuC1SbzYZxYLYVmcuBD Tt4SSFxML3opj4UQxk+9M6qQVuebzlgbImrUocvtxkgYGF8JXmRFTh6UX8v5aEe4 NtUuCrG1mCO/jRZr/pUghjid+Qa1iSjsYthoZOGtgQcdfSPtDyUEO4L8edaUPuRt EWBKKXqNGD3a/J82Oa0IJn2pkTyD1AZYWXW3wm5fdiyEojKUC0g= =uzsb -----END PGP SIGNATURE----- --qwGc6plWUayvsJmPnOQBTavfa2GPcPjVi--