From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 4B768138334 for ; Sun, 20 Oct 2019 08:32:57 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id C2291E0871; Sun, 20 Oct 2019 08:32:53 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 48108E0843 for ; Sun, 20 Oct 2019 08:32:53 +0000 (UTC) Received: from pomiot (c134-66.icpnet.pl [85.221.134.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mgorny) by smtp.gentoo.org (Postfix) with ESMTPSA id 456E934C053; Sun, 20 Oct 2019 08:32:51 +0000 (UTC) Message-ID: <01086c53bfbf7702dac10b75a25927b62ef90b53.camel@gentoo.org> Subject: Re: [gentoo-dev] New distfile mirror layout From: =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?= To: gentoo-dev@lists.gentoo.org Date: Sun, 20 Oct 2019 10:32:45 +0200 In-Reply-To: <100ae6ba-fdd3-b697-0ccc-860c9b8e4521@gentoo.org> References: <4c7465824f1fb69924c826f6bbe3ee73afa08ec8.camel@gentoo.org> <2d15507e-98ad-9466-75b7-7e8268ef2eb9@gentoo.org> <752be6c75f337df8ee8124a804247d2fb27e73b4.camel@gentoo.org> <100ae6ba-fdd3-b697-0ccc-860c9b8e4521@gentoo.org> Organization: Gentoo Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="=-tbF5v0JE4CzI781sx8H2" User-Agent: Evolution 3.32.4 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 X-Archives-Salt: 656247e7-4170-4faf-a3ed-0d7d8f476bb5 X-Archives-Hash: f9be582be3084de132a681ba0c43c76c --=-tbF5v0JE4CzI781sx8H2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, 2019-10-20 at 04:25 -0400, Joshua Kinard wrote: > On 10/20/2019 02:51, Micha=C5=82 G=C3=B3rny wrote: > > On Sat, 2019-10-19 at 19:24 -0400, Joshua Kinard wrote: > > > On 10/18/2019 09:41, Micha=C5=82 G=C3=B3rny wrote: > > > > Hi, everybody. > > > >=20 > > > > It is my pleasure to announce that yesterday (EU) evening we've swi= tched > > > > to a new distfile mirror layout. Users will be switching to the ne= w > > > > layout either as they upgrade Portage to 2.3.77 or -- if they upgra= ded > > > > already -- as their caches expire (24hrs). > > > >=20 > > > > The new layout is mostly a bow towards mirror admins, for some of w= hom > > > > having a 60000+ files in a single directory have been a problem.= =20 > > > > However, I suppose some of you also found e.g. the directory index > > > > hardly usable due to its size. > > > >=20 > > > > Throughout a transitional period (whose exact length hasn't been de= cided > > > > yet), both layouts will be available. Afterwards, the old layout w= ill > > > > be removed from mirrors. This has a few implications: > > > >=20 > > > > 1. Users who don't upgrade their package managers in time will lose > > > > the ability of fetching from Gentoo mirrors. This shouldn't be tha= t > > > > much of a problem given that the core software needed to upgrade Po= rtage > > > > should all have reliable upstream SRC_URIs. > > > >=20 > > > > 2. mirror://gentoo/file URIs will stop working. While technically = you > > > > could use mirror://gentoo/XX/file, I'd rather recommend finally > > > > discarding its usage and moving distfiles to devspace. > > > >=20 > > > > 3. Directly fetching files from distfiles.gentoo.org will become > > > > a little harder. To fetch a distfile named 'foo-1.tar.gz', you'd h= ave > > > > to use something like: > > > >=20 > > > > $ printf '%s' foo-1.tar.gz | b2sum | cut -c1-2 > > > > 1b > > > > $ wget http://distfiles.gentoo.org/distfiles/1b/foo-1.tar.gz > > > > ... > > > >=20 > > > >=20 > > > > Alternatively, you can: > > > >=20 > > > > $ wget http://distfiles.gentoo.org/distfiles/INDEX > > > >=20 > > > > and grep for the right path there. This INDEX is also a more > > > > lightweight alternative to HTML indexes generated by the servers. > > > >=20 > > > >=20 > > > > If you're interested in more background details and some plots, see= [1]. > > > >=20 > > > > [1] https://dev.gentoo.org/~mgorny/articles/improving-distfile-mirr= or-structure.html > > > >=20 > > >=20 > > > So the answer I didn't really see directly stated here is, where do n= ew > > > distfiles need to go //now//? E.g., if on woodpecker, I currently cp= a > > > distfile to /space/distfiles-local. What is the new directory I need= to > > > use? And if mirror://gentoo/${FOO} is going away, for the new distfi= les > > > target, what would be the applicable prefix to use? > > >=20 > > > Directly using devspace seems like a bad idea, IMHO. Once long ago, = we all > > > got chastised for doing exactly that. Too much possibility of fragme= ntation > > > as devs retire or package maintainership changes hands. > >=20 > > Today you get chastised for using /space/distfiles-local and not > > following policy changes. The devmanual states that it's deprecated > > since at least 2011, and talks of using d.g.o [1]. >=20 > I don't recall this change being added as far back as 2011. Maybe my mem= ory > is bad, but if it was done that long ago, it was done quietly, and it was > not enforced. I checked my local mailing list archives for gentoo-dev an= d > don't see any mention of distfiles-local being deprecated back then. Why > has it taken 8 years for this to get addressed? Don't ask me. I think I was already taught to use d.g.o back when I was recruited. > In any event, I still think using devspace is a bad idea. A centralized > distfiles repo is what most other distros use, and it's what we should us= e. Talking doesn't make things happen. Coming up with good proposals that address all the problems (e.g. those listed in devmanual) does. > > > I looked at the whitepaper'ish-like writeup, and I kinda don't like u= sing a > > > hash-based naming scheme on the new distfiles layout. I really kind = prefer > > > breaking the directories up based on the first letter of the distfile= s in > > > question, factoring case-sensitivity in (so you'd have 52 top-level > > > directories for A-Z and a-z, plus 10 more for 0-9). Under each of th= ose > > > directories, additional subdirectories for the next few letters (say, > > > letters 2-3). Yes, this leads to some orphan cases where a distfile = might > > > live on its own, but from a direct navigation standpoint, it's easy t= o find > > > for someone browsing the distfiles server and easy to predict where a > > > distfile is at. > > >=20 > > > No math, statistical analysis, or deep-rooted knowledge of filesystem= s > > > behind that paragraph. Just a plain old unfiltered opinion. Sometim= es, I > > > need to go get a distfile off the Gentoo mirrors, and being able to q= uickly > > > find it in the mirror root is great. Having to do hash calculations = to work > > > out the file path will be *really* annoying. > >=20 > > Your solution still doesn't solve the problem of having 8k-24k files > > in a single directory, even if you use 7 letters of prefix. So it just > > creates a lot of tiny directory noise for no practical gain. >=20 > Why is having a max ~24k files in a directory a bad idea? Modern > filesystems are more than capable of handling that. >=20 > - ext4: unlimited files in a directory > - xfs: virtually unlimited (hard limit of 2^64-1 total files per volume= ) > - ntfs: 4,294,967,295 >=20 > And 24k is a bit more than 1/3rd of all distfiles that we currently have. For the same reason having ~60k files in a directory was a problem.=20 There is really no point in changing anything if you change BIG_NUMBER to SMALLER_BIG_NUMBER. > Under which scenario do you wind up with 24k files in a single directory?= I > consider the tex package an outlier in this case (one package should not = be > the sole dictator of policy). Three versions of TeXLive living simultaneously. If one package falls completely out of bounds, no problem is solved by the change, so what's the point of making it? --=20 Best regards, Micha=C5=82 G=C3=B3rny --=-tbF5v0JE4CzI781sx8H2 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQGTBAABCgB9FiEEx2qEUJQJjSjMiybFY5ra4jKeJA4FAl2sG65fFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEM3 NkE4NDUwOTQwOThEMjhDQzhCMjZDNTYzOUFEQUUyMzI5RTI0MEUACgkQY5ra4jKe JA5gagf+OKNqEb9t9Mi/EQbnA93OnSzAjAd1drxt5jOaQHjCDsksck2tUHx9gVG2 mebRa6HnbjPH3HC7k7LntpHd+pie/HLOamAcYGP97uvBBkDmNqPOTWqUa9cLoT9r lf19Qz/m/00Mw7fs8vAi82QJpmV+MK+d5QX+pP2Oc3BpFnc9fu+WFYv5bXR851jD WXkW9r7Ocvju1Qmqxnw4oVm8yRuKOrJUD5JfCrRSBJWMWQtfxS185Q3+of1wzqlR AdgwfLdXEpdHw2aZFqNKPMF111f1VJlPq+TSixLJppIwD7ibyaFy4k8e9mYIbXy9 r/v8sl/MJFL84W0Z15TttApgrWmHkg== =1LFw -----END PGP SIGNATURE----- --=-tbF5v0JE4CzI781sx8H2--