On Sun, 2019-10-20 at 04:25 -0400, Joshua Kinard wrote: > On 10/20/2019 02:51, Michał Górny wrote: > > On Sat, 2019-10-19 at 19:24 -0400, Joshua Kinard wrote: > > > On 10/18/2019 09:41, Michał Górny wrote: > > > > Hi, everybody. > > > > > > > > It is my pleasure to announce that yesterday (EU) evening we've switched > > > > to a new distfile mirror layout. Users will be switching to the new > > > > layout either as they upgrade Portage to 2.3.77 or -- if they upgraded > > > > already -- as their caches expire (24hrs). > > > > > > > > The new layout is mostly a bow towards mirror admins, for some of whom > > > > having a 60000+ files in a single directory have been a problem. > > > > However, I suppose some of you also found e.g. the directory index > > > > hardly usable due to its size. > > > > > > > > Throughout a transitional period (whose exact length hasn't been decided > > > > yet), both layouts will be available. Afterwards, the old layout will > > > > be removed from mirrors. This has a few implications: > > > > > > > > 1. Users who don't upgrade their package managers in time will lose > > > > the ability of fetching from Gentoo mirrors. This shouldn't be that > > > > much of a problem given that the core software needed to upgrade Portage > > > > should all have reliable upstream SRC_URIs. > > > > > > > > 2. mirror://gentoo/file URIs will stop working. While technically you > > > > could use mirror://gentoo/XX/file, I'd rather recommend finally > > > > discarding its usage and moving distfiles to devspace. > > > > > > > > 3. Directly fetching files from distfiles.gentoo.org will become > > > > a little harder. To fetch a distfile named 'foo-1.tar.gz', you'd have > > > > to use something like: > > > > > > > > $ printf '%s' foo-1.tar.gz | b2sum | cut -c1-2 > > > > 1b > > > > $ wget http://distfiles.gentoo.org/distfiles/1b/foo-1.tar.gz > > > > ... > > > > > > > > > > > > Alternatively, you can: > > > > > > > > $ wget http://distfiles.gentoo.org/distfiles/INDEX > > > > > > > > and grep for the right path there. This INDEX is also a more > > > > lightweight alternative to HTML indexes generated by the servers. > > > > > > > > > > > > If you're interested in more background details and some plots, see [1]. > > > > > > > > [1] https://dev.gentoo.org/~mgorny/articles/improving-distfile-mirror-structure.html > > > > > > > > > > So the answer I didn't really see directly stated here is, where do new > > > distfiles need to go //now//? E.g., if on woodpecker, I currently cp a > > > distfile to /space/distfiles-local. What is the new directory I need to > > > use? And if mirror://gentoo/${FOO} is going away, for the new distfiles > > > target, what would be the applicable prefix to use? > > > > > > Directly using devspace seems like a bad idea, IMHO. Once long ago, we all > > > got chastised for doing exactly that. Too much possibility of fragmentation > > > as devs retire or package maintainership changes hands. > > > > Today you get chastised for using /space/distfiles-local and not > > following policy changes. The devmanual states that it's deprecated > > since at least 2011, and talks of using d.g.o [1]. > > I don't recall this change being added as far back as 2011. Maybe my memory > is bad, but if it was done that long ago, it was done quietly, and it was > not enforced. I checked my local mailing list archives for gentoo-dev and > don't see any mention of distfiles-local being deprecated back then. Why > has it taken 8 years for this to get addressed? Don't ask me. I think I was already taught to use d.g.o back when I was recruited. > In any event, I still think using devspace is a bad idea. A centralized > distfiles repo is what most other distros use, and it's what we should use. Talking doesn't make things happen. Coming up with good proposals that address all the problems (e.g. those listed in devmanual) does. > > > I looked at the whitepaper'ish-like writeup, and I kinda don't like using a > > > hash-based naming scheme on the new distfiles layout. I really kind prefer > > > breaking the directories up based on the first letter of the distfiles in > > > question, factoring case-sensitivity in (so you'd have 52 top-level > > > directories for A-Z and a-z, plus 10 more for 0-9). Under each of those > > > directories, additional subdirectories for the next few letters (say, > > > letters 2-3). Yes, this leads to some orphan cases where a distfile might > > > live on its own, but from a direct navigation standpoint, it's easy to find > > > for someone browsing the distfiles server and easy to predict where a > > > distfile is at. > > > > > > No math, statistical analysis, or deep-rooted knowledge of filesystems > > > behind that paragraph. Just a plain old unfiltered opinion. Sometimes, I > > > need to go get a distfile off the Gentoo mirrors, and being able to quickly > > > find it in the mirror root is great. Having to do hash calculations to work > > > out the file path will be *really* annoying. > > > > Your solution still doesn't solve the problem of having 8k-24k files > > in a single directory, even if you use 7 letters of prefix. So it just > > creates a lot of tiny directory noise for no practical gain. > > Why is having a max ~24k files in a directory a bad idea? Modern > filesystems are more than capable of handling that. > > - ext4: unlimited files in a directory > - xfs: virtually unlimited (hard limit of 2^64-1 total files per volume) > - ntfs: 4,294,967,295 > > And 24k is a bit more than 1/3rd of all distfiles that we currently have. For the same reason having ~60k files in a directory was a problem. There is really no point in changing anything if you change BIG_NUMBER to SMALLER_BIG_NUMBER. > Under which scenario do you wind up with 24k files in a single directory? I > consider the tex package an outlier in this case (one package should not be > the sole dictator of policy). Three versions of TeXLive living simultaneously. If one package falls completely out of bounds, no problem is solved by the change, so what's the point of making it? -- Best regards, Michał Górny