From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id CF36B138334 for ; Sat, 19 Oct 2019 20:02:48 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id F3F26E0905; Sat, 19 Oct 2019 20:02:44 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 9E837E08AD for ; Sat, 19 Oct 2019 20:02:44 +0000 (UTC) Received: from pomiot (c134-66.icpnet.pl [85.221.134.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mgorny) by smtp.gentoo.org (Postfix) with ESMTPSA id F061534C06D; Sat, 19 Oct 2019 20:02:42 +0000 (UTC) Message-ID: Subject: Re: [gentoo-dev] New distfile mirror layout From: =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?= To: gentoo-dev@lists.gentoo.org Date: Sat, 19 Oct 2019 22:02:38 +0200 In-Reply-To: References: <43F5DC91-E5D0-40EA-A1DA-A354C1CB7A16@gentoo.org> Organization: Gentoo Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="=-9qcRfPyFrdAzTSBERGLQ" User-Agent: Evolution 3.32.4 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 X-Archives-Salt: e22d3f2d-03ac-473f-8ba9-b0708dfefe56 X-Archives-Hash: 347e67d9093549469d2c39300e87d45b --=-9qcRfPyFrdAzTSBERGLQ Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, 2019-10-19 at 15:26 -0400, Richard Yao wrote: > > On Oct 18, 2019, at 9:10 PM, Richard Yao wrote: > >=20 > > =EF=BB=BF > > > > On Oct 18, 2019, at 4:49 PM, Micha=C5=82 G=C3=B3rny wrote: > > > =EF=BB=BFOn Fri, 2019-10-18 at 15:53 -0400, Richard Yao wrote: > > > > > > > > > On Oct 18, 2019, at 9:42 AM, Micha=C5=82 G=C3=B3rny wrote: > > > > > > > > =EF=BB=BFHi, everybody. > > > > > > > > It is my pleasure to announce that yesterday (EU) evening w= e've switched > > > > > > > > to a new distfile mirror layout. Users will be switching t= o the new > > > > > > > > layout either as they upgrade Portage to 2.3.77 or -- if th= ey upgraded > > > > > > > > already -- as their caches expire (24hrs). > > > > > > > > The new layout is mostly a bow towards mirror admins, for s= ome of whom > > > > > > > > having a 60000+ files in a single directory have been a pro= blem. > > > > > > > > However, I suppose some of you also found e.g. the director= y index > > > > > > > > hardly usable due to its size. > > > > This sounds like a filesystem issue. Do we know which filesystems a= re suffering? > > > > ZFS should be fine. I believe ext2/ext3 have problems with this man= y files. ext4 is probably okay, but don=E2=80=99t quote me on that. > > > Ext2, VFAT and NTFS were mentioned on the bug [1], though I suppose t= his > > > may apply only to older ntfs versions. NFS has been mentioned too. > >=20 > > ext2 and vfat are not surprises to me (outside of the idea that anyone = would use them for a mirror). NTFS and NFS are though. > > > However, just because modern filesystems can handle them efficiently,= it > > > doesn't mean having directories that huge comes with zero cost. > > While I am okay with the change, what do you mean when you say that hav= ing huge directories does not come with zero cost? > >=20 > > Filesystems with O(1) directory lookups like ZFS would probably be hurt= by this, but the impact should be negligible. Filesystems with O(log n) di= rectory lookups would see faster directory lookups. > >=20 > > Outside of directory lookups, this could speed up up searches and sort = operations when listing everything with just about any filesystem benefitin= g from the improvement. > >=20 > > Listing directories on such filesystems should not benefit from this un= less you are using ls where the default behavior is to sort the directory c= ontents (which is where the improvement when sorting comes into play). The = need to sort the directory contents by default keeps ls from displaying any= thing until it has scanned the entire directory. The asymptotic complexity = of a fast comparison based sort improves in this situation from O(nlogn) to= O(nlog(n/b)) provided that you sort each subdirectory independently. A fur= ther speed up could be obtained by doing multithreading to parallelize the = sort operations. > I read your original email late at night and I misread the description of= how this works. >=20 > At an initial glance, I thought we were doing a prefix approach (with the= caveat that buckets are unbalanced). In reality, we are doing a cryptograp= hic hash of the filenames. >=20 > That would keep all buckets balanced, which gives the best directory look= up times on O(log n) lookup filesystems, but I think there is something to = be gained from using the less optimal approach of using filename prefixes: >=20 > * some regex searches on distfiles can be accelerated > * generating a sorted list of all distfiles becomes asymptotically faster > * it is easy for a user to find all versions of a given distfile > * no need to calculate a cryptographic hash >=20 > I realize that I am late to propose it, but could we consider a switch to= this alternative arrangement? No, we can't. Please read either the original discussion on the bug, or the linked article. It's explained in detail why this won't work. --=20 Best regards, Micha=C5=82 G=C3=B3rny --=-9qcRfPyFrdAzTSBERGLQ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQGTBAABCgB9FiEEx2qEUJQJjSjMiybFY5ra4jKeJA4FAl2ra95fFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEM3 NkE4NDUwOTQwOThEMjhDQzhCMjZDNTYzOUFEQUUyMzI5RTI0MEUACgkQY5ra4jKe JA4MdAf/ZIP1XInNi6Wsa7g6AtGfB7n8rmqzA+x0St85lAga4q4JYDmQKKL8YBCP vrNOgy2ufTkrz9O3Xn4CTdNsX7PAdjHmx57pPBRY98f5t+als98Na5GTmlbV9NCi 8HXz3UOxMxP+L0FGPDEczFAs3ecM8eaN7fZr1PjT4kBpIFafXuuh0a3sjAgLm7Z0 kchEBcT+0V75KN8eqgAqPeYwCAWGd4wx43Kn2BqxXbuTojAa9gIk3LM/VxyEpFqo +sjNe4w3w2V5YRyLwRvKzryu8IDmF+kq4mfXnsIQKFDI2vF6AKM4n5v/UVsXXsGT TbLdF3Ew/Ol5VvRzPvRbD2tjopJgPg== =2wg/ -----END PGP SIGNATURE----- --=-9qcRfPyFrdAzTSBERGLQ--