From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 8FCFD138334 for ; Sat, 19 Oct 2019 06:17:29 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 39E00E0961; Sat, 19 Oct 2019 06:17:25 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id D39C7E0900 for ; Sat, 19 Oct 2019 06:17:24 +0000 (UTC) Received: from pomiot (c134-66.icpnet.pl [85.221.134.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mgorny) by smtp.gentoo.org (Postfix) with ESMTPSA id 64C2534C033; Sat, 19 Oct 2019 06:17:23 +0000 (UTC) Message-ID: Subject: Re: [gentoo-dev] New distfile mirror layout From: =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?= To: gentoo-dev@lists.gentoo.org Date: Sat, 19 Oct 2019 08:17:19 +0200 In-Reply-To: <43F5DC91-E5D0-40EA-A1DA-A354C1CB7A16@gentoo.org> References: <02507080f1e18d0382f551239819fb784cb0c05d.camel@gentoo.org> <43F5DC91-E5D0-40EA-A1DA-A354C1CB7A16@gentoo.org> Organization: Gentoo Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="=-qcTgfXD7VRav/SenMz1O" User-Agent: Evolution 3.32.4 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 X-Archives-Salt: 86b1bcf2-a06e-4c75-aef3-00063642a627 X-Archives-Hash: caf827ae62da07053d61f256762fe403 --=-qcTgfXD7VRav/SenMz1O Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2019-10-18 at 21:09 -0400, Richard Yao wrote: > > On Oct 18, 2019, at 4:49 PM, Micha=C5=82 G=C3=B3rny = wrote: > >=20 > > =EF=BB=BFOn Fri, 2019-10-18 at 15:53 -0400, Richard Yao wrote: > > > > > > > On Oct 18, 2019, at 9:42 AM, Micha=C5=82 G=C3=B3rny wrote: > > > > > > =EF=BB=BFHi, everybody. > > > > > > It is my pleasure to announce that yesterday (EU) evening we've= switched > > > > > > to a new distfile mirror layout. Users will be switching to th= e new > > > > > > layout either as they upgrade Portage to 2.3.77 or -- if they u= pgraded > > > > > > already -- as their caches expire (24hrs). > > > > > > The new layout is mostly a bow towards mirror admins, for some = of whom > > > > > > having a 60000+ files in a single directory have been a problem= . > > > > > > However, I suppose some of you also found e.g. the directory in= dex > > > > > > hardly usable due to its size. > > > This sounds like a filesystem issue. Do we know which filesystems are= suffering? > > > ZFS should be fine. I believe ext2/ext3 have problems with this many = files. ext4 is probably okay, but don=E2=80=99t quote me on that. > >=20 > > Ext2, VFAT and NTFS were mentioned on the bug [1], though I suppose thi= s > > may apply only to older ntfs versions. NFS has been mentioned too. >=20 > ext2 and vfat are not surprises to me (outside of the idea that anyone wo= uld use them for a mirror). NTFS and NFS are though. Are you surprised that people use NTFS on Windows? Or that they use local mirrors over NFS? The latter still needs to be addressed separatel, provided that they mount it on DISTDIR. > > However, just because modern filesystems can handle them efficiently, i= t > > doesn't mean having directories that huge comes with zero cost. > While I am okay with the change, what do you mean when you say that havin= g huge directories does not come with zero cost? >=20 > Filesystems with O(1) directory lookups like ZFS would probably be hurt b= y this O(1) or O(n)? > , but the impact should be negligible. Filesystems with O(log n) director= y lookups would see faster directory lookups. >=20 > Outside of directory lookups, this could speed up up searches and sort op= erations when listing everything with just about any filesystem benefiting = from the improvement. >=20 > Listing directories on such filesystems should not benefit from this unle= ss you are using ls where the default behavior is to sort the directory con= tents (which is where the improvement when sorting comes into play). The ne= ed to sort the directory contents by default keeps ls from displaying anyth= ing until it has scanned the entire directory. The asymptotic complexity of= a fast comparison based sort improves in this situation from O(nlogn) to O= (nlog(n/b)) provided that you sort each subdirectory independently. A furth= er speed up could be obtained by doing multithreading to parallelize the so= rt operations. >=20 > Since I know someone will call me out on that comment, I will explain. Ea= ch bucket has roughly n/b items in it where n is the total number and b is = the number of buckets. Sorting one bucket is O(n/b * log(n/b)). Loop to sor= t each of the b buckets. The buckets are pre-sorted by prefix, so the resul= t is now sorted. You therefore get O(nlog(n/b)) time complexity out of an O= (nlogn) comparison sort on this very special case where you call it multipl= e times on data that has been persorted by prefix into buckets. >=20 > Is there any other benefit to this or did I get everything? Listings for individual directories won't cause major pain to browsers anymore. Not that there's much reason to do them. All kinds of per-direction operations will consume less memory and be potentially faster. --=20 Best regards, Micha=C5=82 G=C3=B3rny --=-qcTgfXD7VRav/SenMz1O Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQGTBAABCgB9FiEEx2qEUJQJjSjMiybFY5ra4jKeJA4FAl2qqm9fFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEM3 NkE4NDUwOTQwOThEMjhDQzhCMjZDNTYzOUFEQUUyMzI5RTI0MEUACgkQY5ra4jKe JA4JFAf9EJUvK8Fwi7lBWpWANDjzwiAyWrneVwycuY1IcATc407jHHNd5QNE2CFI 1tb7o18LxsKTias7+zM8ewvU3Q4o+fzHStP5JNm1CE9TgpGyUQgS/uWL3Z0ELn73 LVxnNQ2rZdJfKv3rvq9TEWGD7oftzTVsydAkVXYoWzMtR0P9066MMHyEXL90aXzf LWUjG0mXRRq1wM1Nkxi8yzG+yvzrYZyYDWgOG18+30Y19l1Ai036e3Jg7Z9ok5JO 9ks+SgW+mrQftf3IPXfa+xZ6x/IJpbvG3ngQHyHs469VsJgKc/MoAEo4Lx9BhEW/ PFXLv9S+1He6ng0YeIh6ZbqZArQ2+Q== =kj3x -----END PGP SIGNATURE----- --=-qcTgfXD7VRav/SenMz1O--