From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 2910B138334 for ; Sun, 20 Oct 2019 20:58:03 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 87843E0C2A; Sun, 20 Oct 2019 20:57:59 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id B9549E08F6 for ; Sun, 20 Oct 2019 20:57:58 +0000 (UTC) Received: from [192.168.1.13] (c-76-114-240-162.hsd1.md.comcast.net [76.114.240.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: kumba) by smtp.gentoo.org (Postfix) with ESMTPSA id 5C14034C0BE for ; Sun, 20 Oct 2019 20:57:57 +0000 (UTC) Subject: Re: [gentoo-dev] New distfile mirror layout To: gentoo-dev@lists.gentoo.org References: <4c7465824f1fb69924c826f6bbe3ee73afa08ec8.camel@gentoo.org> <2d15507e-98ad-9466-75b7-7e8268ef2eb9@gentoo.org> <752be6c75f337df8ee8124a804247d2fb27e73b4.camel@gentoo.org> <100ae6ba-fdd3-b697-0ccc-860c9b8e4521@gentoo.org> <01086c53bfbf7702dac10b75a25927b62ef90b53.camel@gentoo.org> From: Joshua Kinard Openpgp: preference=signencrypt Message-ID: Date: Sun, 20 Oct 2019 16:57:54 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Archives-Salt: 23c74d27-f940-4f1b-8657-b345531b380b X-Archives-Hash: 7dcd18f03a28e7442482f6bd06e90cfa On 10/20/2019 05:44, Michał Górny wrote: > On Sun, 2019-10-20 at 05:21 -0400, Joshua Kinard wrote: >> On 10/20/2019 04:32, Michał Górny wrote: >>> On Sun, 2019-10-20 at 04:25 -0400, Joshua Kinard wrote: >>>> Why is having a max ~24k files in a directory a bad idea? Modern >>>> filesystems are more than capable of handling that. >>>> >>>> - ext4: unlimited files in a directory >>>> - xfs: virtually unlimited (hard limit of 2^64-1 total files per volume) >>>> - ntfs: 4,294,967,295 >>>> >>>> And 24k is a bit more than 1/3rd of all distfiles that we currently have. >>> >>> For the same reason having ~60k files in a directory was a problem. >>> There is really no point in changing anything if you change BIG_NUMBER >>> to SMALLER_BIG_NUMBER. >> >> That doesn't answer my question. Why is it a problem? What criteria are >> you using to decide that 24k is a "smaller big number"? Is there some issue >> highlighted by the mirror admins where having 24k files in a single >> directory offers no significant relief versus the current 60k files? > > IIRC Robin set the goal as: > > | the number of files in a single directory should not exceed 1000, [1] > > I don't recall how that number was chosen but it's probably pretty > arbitrary. In any case, I can notice the difference between working > with a listing of 1k files and 24k files, on the hardware running > masterdist. I think it would be prudent then to get some data to help underpin why that number was chosen and add that to the GLEP, possibly as one of the references at the bottom. Your personal observations of a system (masterdist) that few of us have access to is not good enough, especially for future developers who may revisit this topic long after you or I are gone. > >>>> Under which scenario do you wind up with 24k files in a single directory? I >>>> consider the tex package an outlier in this case (one package should not be >>>> the sole dictator of policy). >>> >>> Three versions of TeXLive living simultaneously. If one package falls >>> completely out of bounds, no problem is solved by the change, so what's >>> the point of making it? >> >> The problem in this case is with texlive, not our current, or future, >> distfiles methodology. > > Is it? Are you suggesting we should ban upstream from using multiple > distfiles with similar prefix? What about other potential packages that > may suffer from the same problem in the future? Go packages have a good > potential, given that majority of them starts with 'github.com'. Please highlight which of my words imply in any way that I want to ban something. I simply said texlive's significant number of distfiles is a problem. That doesn't mean that I want to resolve the problem by banning it, or future packages that employ that method. My concern is that out of the tens of thousands of packages we have, we're allowing ONE package to dictate how we shape a major piece of Gentoo infrastructure, and I don't feel that the proposed solution seeks to address it. Rather, it seeks to band-aid it by wrapping the entire distro up like a mummy. >> Has anyone looked at how other distros deal with texlive? > > Other distros don't mirror original distfiles. Has thought be given to doing the same? This is arguably a better approach than mirroring original distfiles in devspace. This would significantly reduce the infrastructure burden on the project. >> Has anyone complained or filed a bug to texlive developers >> upstream about their excessive amount of distfiles and the burden it places >> on distro maintainers? > > You believe it to be a problem. Don't expect others to bother upstream > with your preferences. Hah. So you consider texlive having 16k+ distfiles to be completely within operating norms then? I did a quick look, and it looks like the TeX project has a fairly comprehensive mirroring system distributed around the world. In fact, it looks like they emulate Perl's CPAN system with "CTAN": https://ctan.org/ I don't know the history of the texlive and other associated tex packages in Gentoo, but my guess is instead of doing what our Perl packages do, someone just decided to mirror the CTAN archive directly on the Gentoo distfiles system. It seems to me that what should actually happen is that we leverage CTAN itself, much like CPAN, and use their mirroring system instead of burdening our infrastructure as an unofficial CTAN archive. I know we've got a ton of Perl packages for the core set of Perl modules, but doesn't the CPAN eclass also have the capability to auto-generate an ebuild package for virtually any Perl package distributed via CPAN? Can that logic be used with the CTAN system in its own eclass and then we remove the 16k+ texlive modules off of our mirrors completely? Or at the worst, we might just have to generate ebuilds for texlive modules and treat them as discrete, installed packages. -- Joshua Kinard Gentoo/MIPS kumba@gentoo.org rsa6144/5C63F4E3F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic