public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: "Michał Górny" <mgorny@gentoo.org>
To: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] New distfile mirror layout
Date: Sun, 20 Oct 2019 08:51:31 +0200	[thread overview]
Message-ID: <752be6c75f337df8ee8124a804247d2fb27e73b4.camel@gentoo.org> (raw)
In-Reply-To: <2d15507e-98ad-9466-75b7-7e8268ef2eb9@gentoo.org>

[-- Attachment #1: Type: text/plain, Size: 4322 bytes --]

On Sat, 2019-10-19 at 19:24 -0400, Joshua Kinard wrote:
> On 10/18/2019 09:41, Michał Górny wrote:
> > Hi, everybody.
> > 
> > It is my pleasure to announce that yesterday (EU) evening we've switched
> > to a new distfile mirror layout.  Users will be switching to the new
> > layout either as they upgrade Portage to 2.3.77 or -- if they upgraded
> > already -- as their caches expire (24hrs).
> > 
> > The new layout is mostly a bow towards mirror admins, for some of whom
> > having a 60000+ files in a single directory have been a problem. 
> > However, I suppose some of you also found e.g. the directory index
> > hardly usable due to its size.
> > 
> > Throughout a transitional period (whose exact length hasn't been decided
> > yet), both layouts will be available.  Afterwards, the old layout will
> > be removed from mirrors.  This has a few implications:
> > 
> > 1. Users who don't upgrade their package managers in time will lose
> > the ability of fetching from Gentoo mirrors.  This shouldn't be that
> > much of a problem given that the core software needed to upgrade Portage
> > should all have reliable upstream SRC_URIs.
> > 
> > 2. mirror://gentoo/file URIs will stop working.  While technically you
> > could use mirror://gentoo/XX/file, I'd rather recommend finally
> > discarding its usage and moving distfiles to devspace.
> > 
> > 3. Directly fetching files from distfiles.gentoo.org will become
> > a little harder.  To fetch a distfile named 'foo-1.tar.gz', you'd have
> > to use something like:
> > 
> > $ printf '%s' foo-1.tar.gz | b2sum | cut -c1-2
> > 1b
> > $ wget http://distfiles.gentoo.org/distfiles/1b/foo-1.tar.gz
> > ...
> > 
> > 
> > Alternatively, you can:
> > 
> > $ wget http://distfiles.gentoo.org/distfiles/INDEX
> > 
> > and grep for the right path there.  This INDEX is also a more
> > lightweight alternative to HTML indexes generated by the servers.
> > 
> > 
> > If you're interested in more background details and some plots, see [1].
> > 
> > [1] https://dev.gentoo.org/~mgorny/articles/improving-distfile-mirror-structure.html
> > 
> 
> So the answer I didn't really see directly stated here is, where do new
> distfiles need to go //now//?  E.g., if on woodpecker, I currently cp a
> distfile to /space/distfiles-local.  What is the new directory I need to
> use?  And if mirror://gentoo/${FOO} is going away, for the new distfiles
> target, what would be the applicable prefix to use?
> 
> Directly using devspace seems like a bad idea, IMHO.  Once long ago, we all
> got chastised for doing exactly that.  Too much possibility of fragmentation
> as devs retire or package maintainership changes hands.

Today you get chastised for using /space/distfiles-local and not
following policy changes.  The devmanual states that it's deprecated
since at least 2011, and talks of using d.g.o [1].

> I looked at the whitepaper'ish-like writeup, and I kinda don't like using a
> hash-based naming scheme on the new distfiles layout.  I really kind prefer
> breaking the directories up based on the first letter of the distfiles in
> question, factoring case-sensitivity in (so you'd have 52 top-level
> directories for A-Z and a-z, plus 10 more for 0-9).  Under each of those
> directories, additional subdirectories for the next few letters (say,
> letters 2-3).  Yes, this leads to some orphan cases where a distfile might
> live on its own, but from a direct navigation standpoint, it's easy to find
> for someone browsing the distfiles server and easy to predict where a
> distfile is at.
> 
> No math, statistical analysis, or deep-rooted knowledge of filesystems
> behind that paragraph.  Just a plain old unfiltered opinion.  Sometimes, I
> need to go get a distfile off the Gentoo mirrors, and being able to quickly
> find it in the mirror root is great.  Having to do hash calculations to work
> out the file path will be *really* annoying.

Your solution still doesn't solve the problem of having 8k-24k files
in a single directory, even if you use 7 letters of prefix.  So it just
creates a lot of tiny directory noise for no practical gain.

[1] https://devmanual.gentoo.org/general-concepts/mirrors/index.html#suitable-download-hosts

-- 
Best regards,
Michał Górny


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

  parent reply	other threads:[~2019-10-20  6:51 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-18 13:41 [gentoo-dev] New distfile mirror layout Michał Górny
2019-10-18 19:53 ` Richard Yao
2019-10-18 20:49   ` Michał Górny
2019-10-19  1:09     ` Richard Yao
2019-10-19  6:17       ` Michał Górny
2019-10-19  8:20         ` Richard Yao
2019-10-19 19:26       ` Richard Yao
2019-10-19 20:02         ` Michał Górny
2019-10-19 22:48           ` Richard Yao
2019-10-22  0:46   ` James Cloos
2019-10-19 13:31 ` Fabian Groffen
2019-10-19 13:53   ` Michał Górny
2019-10-19 23:24 ` Joshua Kinard
2019-10-19 23:57   ` Alec Warner
2019-10-20  0:14     ` Joshua Kinard
2019-10-20  6:51   ` Michał Górny [this message]
2019-10-20  8:25     ` Joshua Kinard
2019-10-20  8:32       ` Michał Górny
2019-10-20  9:21         ` Joshua Kinard
2019-10-20  9:44           ` Michał Górny
2019-10-20 20:57             ` Joshua Kinard
2019-10-21  0:05               ` Joshua Kinard
2019-10-21  5:51                 ` Ulrich Mueller
2019-10-21 10:17                 ` Kent Fredric
2019-10-21 21:34                 ` Mikle Kolyada
2019-10-21 10:13               ` Kent Fredric
2019-10-23  5:16                 ` Joshua Kinard
2019-10-29 16:35                   ` Kent Fredric
2019-10-20 17:09       ` Matt Turner
2019-10-21 16:42     ` Richard Yao
2019-10-21 23:36       ` Matt Turner
2019-10-23  5:18         ` Joshua Kinard
2019-10-23 17:06           ` William Hubbs
2019-10-23 18:38             ` William Hubbs
2019-10-23 22:04           ` William Hubbs
2019-10-24  4:30             ` Michał Górny
2019-10-22  6:51       ` Jaco Kroon
2019-10-22  8:43         ` Ulrich Mueller
2019-10-22  8:46           ` Jaco Kroon
2019-10-23 23:47         ` ext4 readdir performance - was " Richard Yao
2019-10-24  0:01           ` Richard Yao
2019-10-23  1:21       ` Rich Freeman
2019-10-28 23:24     ` Chí-Thanh Christopher Nguyễn
2019-10-29  4:27       ` Michał Górny
2019-10-29  9:34         ` Fabian Groffen
2019-10-29 11:11           ` Michał Górny
2019-10-29 12:23             ` Ulrich Mueller
2019-10-29 12:43               ` Michał Górny
2019-10-29 13:03                 ` Ulrich Mueller
2019-10-29 13:09                   ` Ulrich Mueller
2019-10-29 13:52                     ` Michał Górny
2019-10-29 14:17                       ` Ulrich Mueller
2019-10-29 14:33                         ` Fabian Groffen
2019-10-29 14:45                           ` Michał Górny
2019-10-29 14:56                             ` Fabian Groffen
2019-10-29 13:51                   ` Michał Górny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=752be6c75f337df8ee8124a804247d2fb27e73b4.camel@gentoo.org \
    --to=mgorny@gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox