From: Rich Freeman <rich0@gentoo.org>
To: gentoo-dev <gentoo-dev@lists.gentoo.org>
Subject: Re: [gentoo-dev] New distfile mirror layout
Date: Tue, 22 Oct 2019 21:21:12 -0400 [thread overview]
Message-ID: <CAGfcS_=_dVopi1BNNGgvTQNSkM_yLC7ZTUbQ43uPHCwgB=Oxhw@mail.gmail.com> (raw)
In-Reply-To: <F5C72C3C-3264-43F4-962B-5A89F0E33A8E@gentoo.org>
On Mon, Oct 21, 2019 at 12:42 PM Richard Yao <ryao@gentoo.org> wrote:
>
> Also, another idea is to use a cheap hash function (e.g. fletcher) and just have the mirrors do the hashing behind the scenes. Then we would have the best of both worlds.
I think something that is getting missed in this discussion is that we
don't control all of our mirrors, and they're generally donated
resources. Somebody has some webserver, and they stick a Debian
mirror in one directory tree, and an Arch one in another, and they're
kind enough to give us one too.
That is why we're seeing odder situations like ntfs and so on being
mentioned. They're not necessarily even running Linux, let alone zfs
or some other optimized filesystem. And their webserver might be set
up to do browsable directory indexes which could perform terribly even
if the filesystem itself is fine with direct filename lookups. It
doesn't matter if you have hashed b-trees or whatever for filename
lookups if you're going to ask the filesystem to give you a list of
every file in a large directory - it is going to have to traverse
whatever data structure it uses entirely to do so.
If we want to start putting requirements on hosting a mirror, then
we'll end up with less mirrors, and with mirrors more is usually
better. Ideally a mirror should just be a black box to us - we don't
really care what they're running because we don't depend on any mirror
individually. Likewise if we negatively impact mirror hosts we'll end
up with less mirrors. Sure, maybe those hosts have odd
configurations, but we're still better off with them than without.
That said we do seem to have a lot of mirrors so it probably isn't the
end of the world if we lose a limited number.
And there is nothing to say that we can't have some infra mirror set
up more for interactive browsing that we don't have people fetch from
but which dispenses with all the hashing or which bins by the first
letter of the filename/etc. It seems like most of the use cases where
hashing is inconvenient are for more casual use.
To avoid another reply, people are talking about having utilities that
can fetch distfiles using the new scheme. I'd think that "ebuild
foo.ebuild fetch" is probably the simplest solution for this. Chances
are that you're dealing with SRC_URI strings that have variable
substitution in them anyway, so just letting ebuild do the fetching
means you're not substituting ${PV} and so on, let alone all the stuff
versionator and its ilk do. And of course you can always just fetch
from upstream anyway if you do have a clean URI.
--
Rich
next prev parent reply other threads:[~2019-10-23 1:21 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-18 13:41 [gentoo-dev] New distfile mirror layout Michał Górny
2019-10-18 19:53 ` Richard Yao
2019-10-18 20:49 ` Michał Górny
2019-10-19 1:09 ` Richard Yao
2019-10-19 6:17 ` Michał Górny
2019-10-19 8:20 ` Richard Yao
2019-10-19 19:26 ` Richard Yao
2019-10-19 20:02 ` Michał Górny
2019-10-19 22:48 ` Richard Yao
2019-10-22 0:46 ` James Cloos
2019-10-19 13:31 ` Fabian Groffen
2019-10-19 13:53 ` Michał Górny
2019-10-19 23:24 ` Joshua Kinard
2019-10-19 23:57 ` Alec Warner
2019-10-20 0:14 ` Joshua Kinard
2019-10-20 6:51 ` Michał Górny
2019-10-20 8:25 ` Joshua Kinard
2019-10-20 8:32 ` Michał Górny
2019-10-20 9:21 ` Joshua Kinard
2019-10-20 9:44 ` Michał Górny
2019-10-20 20:57 ` Joshua Kinard
2019-10-21 0:05 ` Joshua Kinard
2019-10-21 5:51 ` Ulrich Mueller
2019-10-21 10:17 ` Kent Fredric
2019-10-21 21:34 ` Mikle Kolyada
2019-10-21 10:13 ` Kent Fredric
2019-10-23 5:16 ` Joshua Kinard
2019-10-29 16:35 ` Kent Fredric
2019-10-20 17:09 ` Matt Turner
2019-10-21 16:42 ` Richard Yao
2019-10-21 23:36 ` Matt Turner
2019-10-23 5:18 ` Joshua Kinard
2019-10-23 17:06 ` William Hubbs
2019-10-23 18:38 ` William Hubbs
2019-10-23 22:04 ` William Hubbs
2019-10-24 4:30 ` Michał Górny
2019-10-22 6:51 ` Jaco Kroon
2019-10-22 8:43 ` Ulrich Mueller
2019-10-22 8:46 ` Jaco Kroon
2019-10-23 23:47 ` ext4 readdir performance - was " Richard Yao
2019-10-24 0:01 ` Richard Yao
2019-10-23 1:21 ` Rich Freeman [this message]
2019-10-28 23:24 ` Chí-Thanh Christopher Nguyễn
2019-10-29 4:27 ` Michał Górny
2019-10-29 9:34 ` Fabian Groffen
2019-10-29 11:11 ` Michał Górny
2019-10-29 12:23 ` Ulrich Mueller
2019-10-29 12:43 ` Michał Górny
2019-10-29 13:03 ` Ulrich Mueller
2019-10-29 13:09 ` Ulrich Mueller
2019-10-29 13:52 ` Michał Górny
2019-10-29 14:17 ` Ulrich Mueller
2019-10-29 14:33 ` Fabian Groffen
2019-10-29 14:45 ` Michał Górny
2019-10-29 14:56 ` Fabian Groffen
2019-10-29 13:51 ` Michał Górny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGfcS_=_dVopi1BNNGgvTQNSkM_yLC7ZTUbQ43uPHCwgB=Oxhw@mail.gmail.com' \
--to=rich0@gentoo.org \
--cc=gentoo-dev@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox